Skip to content

[RFC]: implement a broader range of statistical distributions #224

@JavaTypedScript

Description

@JavaTypedScript

Full name

Rohit R Bhat

University status

Yes

University name

R V Institute of Technology and Management

University program

B.E in Computer Science and Engineering

Expected graduation

July 2027

Short biography

I am Rohit R Bhat, a third year engineering student in Computer Science and Engineering at R V Institute of Technology, ranked top 5 in my course. My strong academic foundation in Computer Science and Mathematics includes coursework in statistics, data structures, algorithms, discrete algorithms, linear algebra, multivariate calculus and numerical approximation techniques. I possess technical expertise in programming languages such as JavaScript, Python, C, C++ and Java, alongside practical experience in web and app development. My proficiency extends to Machine Learning(ML) and Artificial Intelligence(AI).
I have significant research and development experience in programming,demonstrated in my internship at Aytasense Technologies Pvt. Ltd., where i successfully completed the project titled "Detection of IRIS-PUPIL Ratio in Real-Time using machine learning techniques".I have regularly participated in national hackathons like Pravega XII and managed to create a full working event management platform in cultrang fully implemented using javascript.

Timezone

Indian Standard Time (UTC+5:30)

Contact details

email:bhatrr2021@gmail.com,github:JavaTypedScript

Platform

Windows

Editor

VS Code is my preferred code editor. It is one of the most popular and most flexible code editors, that provide wide range of options in all aspects of programming. The extension support of various languages and utilities makes development faster and efficient.

Programming experience

As listed above, building projects in c,c++,javascript and python brings joy to me.
Here is a list of my projects:

  • Redis : A redis server and client built from scratch using c and c++. I loved implementing the event driven aspect and the strengthened the networking insights.
  • Eventify: A full event management app, that solves the hectic event sheduling, resource allocation, monitoring, communication etc. Fully implemented using javascript, react.js and mongo DB.
  • Flow Monitor: A flow state detection and distraction blocker with comprehensive analytics dashboard. A browser extension developed using javascript and HTML.

JavaScript experience

I was introduced to javascript in my first year of course, where a website for a grocery store was required. That was fun, where I learnt basics of the language continuing till async/await and promises in javascript. Writing API's provided valuable insights about the underhood working of javascript.
Building full stack applications further strengthend my grip over javascript and eventually forced to learn typescript for type safety.
The flexibility what javascript provides is commendable as a full stack application, a browser extension and various libraries all can be impelmented by it.

Node.js experience

My Node.js experience always leads me to the robust backend utilities I have written to power full stack applications. The RESTful api support is the best. High Performance for I/O-Bound Tasks can handle thousands of concurrent connections with very low memory overhead. Its asynchronous nature makes it incredibly efficient at maintaining open connections via Web Sockets.Node.js is lightweight, modular, and has very fast startup times which makes it a perfect fit for modern cloud architectures. The NPM manager is a game changer with millions of open-source packages available, developers rarely have to reinvent the wheel.

C/Fortran experience

My first programming experience starts with C. The low-level implementation teaches the basics of programming and helps to understand the memory control. Its blazing speed and performance make it suitable for time critical programs. The networking aspect of it is amazing and helps to write amazing portable programs. I remember porting Arduino codes from a computer to a rover ,was an easy task.
I have seen Fortran tutorials but never have experienced programming in it. If given a chance, will definitely learn and use it to finish the project.

Interest in stdlib

The fact that there isn't a standard and widely used numerical and computing library, in such a powerful language such as javascript, really drives into contributing to stdlib.js. The browser environment needs a standard library that solves the dependence on other source for numerical and computing purposes, makes contributing to worthy.I like the modular approach breaking the complexity hierarchically. The fact that it is rigourously tested and highly performant, drives learning interest.

Version control

Yes

Contributions to stdlib

I have positively contributed to stdlib.js, beginning with GoodFirst issues and transitioned to implement C base math special functions #8873 and #8951

In this journey worked on refactoring benchmark string interpolations,C and Js implementation for base math special functions and have made many lint and error fixes.
Looking forward to increase contributions by being part of the GSOC 2026 program.

stdlib showcase

Made an interactive image compressor comviz ,that shows how an image is deconstructed and reconstructed using SVD(Singular value Decomposition) using stdlib.js functions. Also visualizing the matrix calculations in real time.

I have used @stdlib/random/base/randu to draw uniformly distributed random numbers along with @stdlib/blas/base/ddot/ for dot product,@stdlib/array/float64 for Float64Array etc.

Goals

Since we are dealing with lot of distributions, I have considered continuous and discrete distributions as the topmost priority for this project. they are univariate and the support to implement and test them falls under the scope and timeline of the project.

The above google sheet contains all the continuous and discrete distributions, primarily referred through scipy because of its previous influence on the existing distributions and time-tested stability. The challenges and implementation strategies are mentioned through extensive research for each distribution.

I have chosen to not to include (suggestion seeked) multivariate, summary and frequency distributions as mentioned in the scipy docs (which we were said to study) as my first priority, because of high dependence of linear algebra foundation, to be built from scratch up.
Building linear algebra functionalities may seem a task, but building a highly stable one may be suitable as extension of this project as implementing distributions along with random variates seems too overwhelming task, whilst having around 80+ univariate distributions on line.

So I propose to implement univariate distributions along with api's to draw their random variates first and executing multivariate, summary and frequency distributions after extensive study and understanding of the context of linear algebra requirements carefully with more discussions held, as an extension to this project.

Why this project?

Statistical distributions are important in many fields whether for scientific computing or mathematical proofs, hence contributing to such impactful project not only helps to strengthen the core concepts but also teaches to implement these through programming. Since I am interested in machine learning, the stats serve as an important part of the subject, while helping to construct several ML algorithms. The fact that many people are going to use your handwritten code, for high impactful projects fuels me to be part of this project.

Qualifications

Statistical distribution is a topic actively studied in my course. It was first introduced in my second year of course (as Mathematics-3(stats)), where I was able to score a perfect score in that subject. In my ongoing year of course, we have a subject Machine Learning for CSE, which contains the stats as its base, so implementing it here helps my course also.
I have read the books All of Statistics by Larry Wasserman, which unlike regular textbooks, teaches relevant machine learning algos implemented using stats and Think Stats by Allen B. Downey, teaches stats through python programming for loops, instead of complicated integrals.
With continuous learning and prerequisites from my courses and interest, I feel contributing to this project enhances my overall knowledge.

Prior art

My prior art has been surfaced around the python library scipy for implementation and wikipedia for formulas of distributions.

  • scipy: The github repo of scipy contains extensible resources to implement distributions. The time-tested strategies serve as the most trusted and stable implementations. All necessary cdf(), pdf(), mean(), variance(), kurtosis() and skewness() are present at one place. The necessary functions are also clearly mentioned and previous implementations have been inspired by this repo, this serves as a golden entry point.
  • wikipedia: Serves as global repository for formulas and references of various distributions.
  • boost : Is a wonderful hub for helper functions and distributions. Highly trusted and stable implementation.

These sources can be used in implementation of stable distributions and modified according to JavaScript constraints easily.

Commitment

I am ready for the commitment of 30hrs/week throughout the GSOC 12 week(350 hours) timeline and will also be happy to be part of this project even after this program ends. Since my exams are scheduled in a timespan of one month in the month of may, I will easily be able to give 30hrs/week in those weeks as well, because of too many gaps in between each exam and no classes then. From the month of june(starting of summer break), I can be completely into the project and 30hrs/week should be a piece of cake. Following semesters focus on getting internships, which helps me to be part of this program strongly.

Schedule

Assuming a 12 week schedule,

  • Weekly around 10 distributions are to be targeted.

  • Coding will be done with a focus of finishing 2 distributions per day.

  • PR of a two distributions submitted daily.

  • Review to be taken for every 2 days.

  • Community Bonding Period: Community bonding period is utilized to discuss upon the suggestion about the project through discussion between mentors, understanding the codebase concretely, further refining the plan and clarifying the doubts if any in proposal.
    Further research on special functions required and may raise a sample pr to check to robustness of distributions implemented.

  • Week 1:

    • Core Mathematical Foundations & First Continuous Batch:
      • Prerequisites: Implement special functions: ive() (Exponentially modified Bessel function), logsumexp, and _pow1pm1.
      • Distributions (11): alpha, anglit, argus , betaprime, bradford, burr, burr12, crystalball, dgamma, dpareto_lognorm , dweibull.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 2:

    • Exponential & Normal-Related Families:
      • Dependencies: exp(), log1p(), erf(), and expm1().
      • Distributions: exponnorm, exponweib, exponpow, fatiguelife, fisk, foldcauchy, foldnorm, genlogistic, gennorm, genpareto, genexpon.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 3:

    • Gamma, Logistic & Extreme Value Distributions:
      • Dependencies: gammaln, gammainc, digamma, and zeta.
      • Distributions: genextreme, gengamma, gausshyper, genhalflogistic, gibrat, gompertz, halfcauchy, halflogistic, halfnorm, hypsecant, invgauss/wald.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 4:

    • Shape-Based & Skewed Distributions:
      • Dependencies: Johnson system logic and Kappa distribution parameters.
      • Distributions:inweibull, irvinhall, jf_skew_t, johnsonsb, johnsonsu, kappa4, kappa3, ksone, kstwo, kstwobign, landau.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 5:

    • Non-Central & Power Law Distributions:
      • Mathematical Prerequisites: ncx2 (Non-central Chi-square core) and chndtr.
      • Distributions:laplace_asymmetric, levy_l, loggamma, loglaplace, lomax, maxwell, mielke, moyal, nakagami, ncx2, ncf.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 6: (midterm)

    • Precision Statistics & Reciprocal Families:
      • Dependencies: Reciprocal inverse Gaussian logic and high-precision denominators.
      • Distributions:nct, pearson3, powerlaw, powerlognorm, powernorm, rdist, reciprocal, rel_breitwigner, rice, recipinvgaus, semicircular.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
      • Review: Review of overall progress of the project and discussion of changes if needed.
  • Week 7:

    • Truncated Distributions & Discrete Start:
      • Mathematical Prerequisites: von_mises specific functions (i0e, chbevl).
      • Distributions: skewcauchy, trapezoid, truncexpon, truncnorm, truncpareto, truncweibull_min, vonmises, wald, wrapcauchy, betabinom, betanbinom.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 8:

    • Discrete Distributions:
      • Mathematical Prerequisites: pdtr() (Poisson distribution functions) and gen_harmonic logic.
      • Distributions: boltzmann, dlaplace, logser, nbinom, nhypergeom, poisson_binom, skellam, yulesimon, zipf, zipfian.
      • Focus: Porting simple PDF/CDF and benchmarking logic from SciPy to JS/C.
  • Week 9:

    • Sampling Distributions for dependent distributions of already exisiting ones:
      • Distributions:
        * Chi-Square ($k$): Implement as a Gamma subcase ($X \sim \text{Gam}(k/2, 2)$).
        * Student's t ($\nu$): Use the Normal / Chi-Square Ratio ($X = Z / \sqrt{V/\nu}$).
        * F-Distribution ($d_1, d_2$): Implement as a Ratio of Scaled Chi-Squares.
        * Gumbel & Logistic: Implement via Inverse Transform Method.
      • Key Tasks:
        • Ensure numerical stability for ratios where the denominator approach zero.
        • Leverage existing math/base/special functions for precision.
      • Validation: Compare empirical variance against theoretical formulas for heavy-tailed distributions.
  • Week 10:

    • Mixtures & Power-Law Distributions:
      • Distributions:
      • Beta-Binomial ($n, \alpha, \beta$): Implement as a Mixture (Sample $p \sim \text{Beta}$, then $X \sim \text{Binom}$).
      • Skellam ($\mu_1, \mu_2$): Implement as the Difference of two independent Poissons.
      • Zipf (a) & Zipfian (a, s): Use Inversion by Cumulation.
      • Logarithmic Series (p): Use Sequential Inversion.
      • Key Tasks:
        • Implement/Verify Riemann Zeta ($\zeta$) and Hurwitz Zeta ($\zeta_H$) function accuracy.
        • Optimize discrete search loops using "chop-down" inversion to minimize iterations.
  • Week 11:

    • Specialized Kernels & Performance Hardening:
      • Distributions:
        • Wallenius' & Fisher's Noncentral Hypergeometric: Implement Chop-down Inversion and Conditional Binomials.
        • Semi-Circular & Wrap Cauchy: Use Cosine Transformation and Arc-Tangent Mapping.
        • Optimization (The Ziggurat Upgrade):
          • Refactor the Normal and Gamma generators from Phase 3 to use the Ziggurat algorithm for high-performance sampling.
      • Key Tasks:
        • Run make benchmark for all Phase 4 distributions.
        • Ensure zero-allocation in the main sampling loops to prevent GC (Garbage Collection) overhead.
  • Week 12:

    • Finalization:
      • Documentation: Complete JSDoc with LaTeX equations and examples.
      • Final Linting: Run make lint and fix style violations.
      • PR Submission: Consolidate commits and submit the Pull Request to stdlib-js/stdlib.
  • Final Week:

    • Review and Feedback:
      • Final review done and clean up.
      • Discussion about further extension of program.
      • Feedback received on the project.

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

GSOC 2026 [#2]

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • I have read and understood the stdlib showcase requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    20262026 GSoC proposal.rfcProject proposal.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions