This project implements a fully connected neural network for handwritten digit classification from scratch using NumPy, without relying on deep learning frameworks such as PyTorch or TensorFlow.
The goal of the project was to understand the mathematical foundations of neural networks by implementing forward propagation, backpropagation, and gradient descent manually.
The model is trained on the MNIST handwritten digit dataset, which contains:
-
60,000 training images
-
10,000 test images
-
Image size: 28 × 28 grayscale pixels
-
10 output classes (digits 0–9)
-
Input pixels are normalized to the range [0,1] to prevent extremely large gradients. Keep in mind that test input data must also be normalized to this range.
I stuck to a two-layer fully connected neural network: Input (784) -> Linear Layer (784 -> 256) -> ReLU Activtion -> Linear Layer (256 -> 10) -> Softmax for classification.
The network is trained using softmax cross-entropy loss applied to the logits produced by the final linear layer.
The softmax function converts logits into probabilities:
The cross-entropy loss for a single example is:
I used a numerically stable logsumexp() implementation to prevent overflow when exponentiating the logits by subtracting the maximum logit before exponentiation and adding it back afterward.
Combining softmax with cross-entropy yields the gradient:
where
For a linear layer
the gradient with respect to the weights is
The derivative of the ReLU activation is
where
Final performance:
- Training accuracy: ~93.09%
- Test accuracy: ~93.39%
This project was built to gain a deeper understanding of:
- Gradient-based optimization
- Neural network backpropagation
- Numerical stability in machine learning
- Vectorized linear algebra implementations
- Numpy operations, in general

