KushagraSinghog/Digit-classifier
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
π§ MNIST Digit Classification Pipeline
This project demonstrates a complete end-to-end machine learning workflow for classifying handwritten digits from the MNIST dataset
.
It covers data preprocessing, visualization, model training, evaluation using multiple metrics, and building a reproducible pipeline with model persistence.
Here we have used a Binary Classifier for a digit(SGDClassifier) as well as a Multi-Class Classifier(RandomForestClassifier)
π Features:
π Data Loading & Preprocessing with fetch_openml.
π Visualization of handwritten digits.
π Binary Classification (digit "5" vs. not "5").
π Evaluation Metrics:
1. Confusion Matrix
2. Precision, Recall, F1-score
3. Precision-Recall Curve
4. ROC Curve
π€ Model Training & Comparison:
1. SGDClassifier
2. RandomForestClassifier
β‘ Scaling & Pipelines with StandardScaler and Pipeline.
πΎ Model Persistence using joblib.
π Results:
SGD Classifier: ~83% precision, ~65% recall (digit "5")
Random Forest Classifier:
1. Training Accuracy: ~99%
2. Test Accuracy: ~96%
3. Weighted Precision & Recall: ~96%
Random Forest significantly outperformed SGD on this task.
π Visualizations:
Confusion Matrix (normalized)
Precision-Recall vs. Threshold Curve
ROC Curves (SGD vs Random Forest)
π― Learnings:
1. Importance of evaluation beyond accuracy
2. How scaling improves model performance
3. How to use Pipelines for cleaner ML workflows
4. Saving/loading models for deployment