Given a set of data points, find the line that best fits them, without using any ML libraries like sklearn.
Most people learn ML by calling model.fit() and never understanding
what happens inside. This project implements everything manually :
the math, the gradients, the update rule — so the black box becomes
transparent.
Built gradient descent from scratch using only NumPy:
predict(): computes y = wX + bloss(): measures error using Mean Squared Errorgradients(): computes ∂L/∂w and ∂L/∂b using calculusupdate(): moves w and b downhill using the gradient
The model starts with w=0, b=0 and learns w≈3, b≈2 from noisy data after 10,000 epochs.
- Gradient descent updates parameters in the direction that reduces loss fastest
- Learning rate controls step size: too large and the model diverges
- Noisy data means you can never perfectly recover true parameters
- Why b converges slower than w (no X amplification in db)
- How to track model evolution using Git commits
Requirements: pip install numpy matplotlib
Run: python main.py
This will train the model and display the learned line over the data.
main.py — training loop and visualization
model.py — predict, loss, gradients, update functions
utils.py — plotting helper functions
results/ — output plots saved here
- Tested learning rates: 0.001 (slow), 0.005 (stable), 0.1 (diverges)
- Observed divergence at α=0.1 : loss exploded to inf then nan
- Confirmed convergence at α=0.005 after 10,000 epochs
This project builds the foundation for understanding:
- Neural networks (same gradient descent principle)
- Deep learning frameworks (what
model.fit()does internally)
Linear regression is simply an optimization problem solved using gradient descent, not a black-box ML algorithm. This project proves that by building every component from first principles.
The model converges close to ground truth:
- Learned: w ≈ 3.015, b ≈ 1.811
- Truth: w = 3.000, b = 2.000
Loss drops from 401.84 at epoch 0 to 0.81 at epoch 9900, confirming correct gradient descent implementation. The small gap in b is expected, irreducible error caused by noise in the data, not a bug.

