GitHub - KushagraSinghog/Digit-classifier

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ReadMe.txt		ReadMe.txt
final.ipynb		final.ipynb
final.py		final.py
jupyter.ipynb		jupyter.ipynb
main.py		main.py

Repository files navigation

                                                    🧠 MNIST Digit Classification Pipeline

This project demonstrates a complete end-to-end machine learning workflow for classifying handwritten digits from the MNIST dataset
.

It covers data preprocessing, visualization, model training, evaluation using multiple metrics, and building a reproducible pipeline with model persistence.
Here we have used a Binary Classifier for a digit(SGDClassifier) as well as a Multi-Class Classifier(RandomForestClassifier) 

📌 Features:

    📂 Data Loading & Preprocessing with fetch_openml.

    👁 Visualization of handwritten digits.

    🔍 Binary Classification (digit "5" vs. not "5").

    📊 Evaluation Metrics:

            1. Confusion Matrix

            2. Precision, Recall, F1-score

            3. Precision-Recall Curve

            4. ROC Curve

    🤖 Model Training & Comparison:

            1. SGDClassifier

            2. RandomForestClassifier

    ⚡ Scaling & Pipelines with StandardScaler and Pipeline.

    💾 Model Persistence using joblib.

    📈 Results:

            SGD Classifier: ~83% precision, ~65% recall (digit "5")

            Random Forest Classifier:

                  1. Training Accuracy: ~99%

                  2. Test Accuracy: ~96%

                  3. Weighted Precision & Recall: ~96%

            Random Forest significantly outperformed SGD on this task.

    📊 Visualizations:

            Confusion Matrix (normalized)

            Precision-Recall vs. Threshold Curve

            ROC Curves (SGD vs Random Forest)


🎯 Learnings:

              1. Importance of evaluation beyond accuracy

              2. How scaling improves model performance

              3. How to use Pipelines for cleaner ML workflows

              4. Saving/loading models for deployment