Skip to content

ARYASINGH0504/Diabetes-Type-2-Prediction-using-Machine-Learning-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Multi-Model Machine Learning Pipeline for Type 2 Diabetes Risk Stratification

Academic Project – Machine Learning / Data Science

This project focuses on building a machine learning-based system to predict whether a person is diabetic or not based on several medical diagnostic measurements. The aim of this academic project is to apply data preprocessing, exploratory data analysis, and classification algorithms to create a reliable prediction model for early identification of diabetes.

Project Overview

Diabetes is a chronic disease that affects millions of people globally. Early detection plays a crucial role in preventing complications. This project uses supervised machine learning techniques to predict diabetes using a structured dataset.

The work includes:

  • Data preprocessing
  • Feature cleaning
  • Model building
  • Performance evaluation
  • Confusion matrix visualization

It demonstrates practical application of ML concepts in a real-world medical prediction problem.

Objectives

  • To analyze the diabetes dataset and understand key health indicators
  • To apply machine learning algorithms for predicting diabetes
  • To evaluate model accuracy, precision, recall, and F1-score
  • To build a reusable and easy-to-run ML pipeline

Dataset Description

The dataset contains the following health-related features:

  • Pregnancies
  • Glucose Level
  • Blood Pressure
  • Skin Thickness
  • Insulin Level
  • BMI
  • Diabetes Pedigree Function
  • Age
  • Outcome (0 = No Diabetes, 1 = Diabetes)

The dataset is clean, structured, and suitable for classification tasks.

Methodology

The entire workflow includes:

Data Loading

  • Reads the CSV dataset
  • Handles missing and zero values

Preprocessing

  • Replaces invalid values
  • Splits data into train–test sets
  • Scales features for better model performance

Model Building

A Logistic Regression model is used because:

  • It is simple
  • Performs well for binary classification
  • Easy to interpret

Evaluation

Model performance is measured using:

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Confusion Matrix (visualized)

Results

The model gives a balanced performance in detecting diabetic and non-diabetic cases. Evaluation metrics such as accuracy and F1-score demonstrate the model’s effectiveness.

A confusion matrix plot is also generated for better understanding of classification performance.

Technologies Used

  • Python
  • NumPy
  • Pandas
  • Matplotlib
  • Scikit-Learn
  • Seaborn
  • Joblib

How to Run the Project

Install dependencies:

bash pip install -r requirements.txt

Train the model:

bash python src/main.py --mode train

Evaluate the model:

bash python src/main.py --mode evaluate

Academic Significance

This project demonstrates:

  • Understanding of machine learning workflows
  • Ability to clean and prepare raw data
  • Implementation of classification algorithms
  • Evaluation of model performance
  • Knowledge of real-world medical dataset challenges

It is suitable for college submissions, machine learning courses, mini-projects, and academic presentations.

Conclusion

This Diabetes Prediction project successfully builds a machine learning model capable of predicting diabetes with good accuracy. The project showcases end-to-end ML development skills — from data preprocessing to final evaluation — making it an excellent academic demonstration of machine learning applications in healthcare.

About

Diabetes is a chronic disease that affects millions of people globally. Early detection plays a crucial role in preventing complications. This project uses supervised machine learning techniques to predict diabetes using a structured dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors