Predicting Air Quality Index (AQI) using machine learning techniques in Python — analyzing pollutant data to forecast air quality levels and help identify environmental health risks.
- Overview
- What is AQI?
- Dataset
- Tech Stack
- Project Structure
- Workflow
- Models Used
- Results
- How to Run
- Author
Air pollution is one of the most critical environmental challenges of the 21st century. This project builds a machine learning model to predict the Air Quality Index (AQI) based on concentrations of key air pollutants. By accurately predicting AQI, we can:
- Provide early warnings for hazardous air quality events
- Help governments and citizens make informed decisions
- Analyze pollution trends across regions and time periods
The Air Quality Index (AQI) is a standardized scale used to communicate how polluted the air currently is or how polluted it is forecast to become.
| AQI Range | Category | Health Impact |
|---|---|---|
| 0 – 50 | Good | Little or no risk |
| 51 – 100 | Moderate | Acceptable; some concern for sensitive groups |
| 101 – 150 | Unhealthy for Sensitive Groups | Sensitive people may experience effects |
| 151 – 200 | Unhealthy | Everyone may begin to experience effects |
| 201 – 300 | Very Unhealthy | Health alert — serious effects for everyone |
| 301+ | Hazardous | Emergency conditions |
The dataset contains readings of major air pollutants used to compute AQI:
| Feature | Description |
|---|---|
PM2.5 |
Fine particulate matter (≤ 2.5 µm) |
PM10 |
Coarse particulate matter (≤ 10 µm) |
NO |
Nitric Oxide |
NO2 |
Nitrogen Dioxide |
NOx |
Nitrogen Oxides |
NH3 |
Ammonia |
CO |
Carbon Monoxide |
SO2 |
Sulfur Dioxide |
O3 |
Ozone |
Benzene |
Benzene concentration |
Toluene |
Toluene concentration |
AQI |
Target variable — Air Quality Index |
AQI_Bucket |
AQI category label |
Python 3.8+
├── pandas — Data loading & manipulation
├── numpy — Numerical operations
├── matplotlib — Data visualization
├── seaborn — Statistical plots
├── scikit-learn — ML models & preprocessing
└── Jupyter Notebook — Interactive development environment
aqi-prediction-python/
│
├── Predicting_Air_Quality_Index_using_Python.ipynb # Main notebook
└── README.md # Project documentation
1. Data Loading
↓
2. Exploratory Data Analysis (EDA)
├── Shape, dtypes, null values
├── Distribution plots
└── Correlation heatmap
↓
3. Data Preprocessing
├── Handling missing values
├── Feature selection
└── Train-test split
↓
4. Model Training
├── Multiple regression models
└── Hyperparameter tuning
↓
5. Model Evaluation
├── R² Score
├── MAE / RMSE
└── Prediction vs Actual plots
↓
6. Results & Conclusions
| Model | Type |
|---|---|
| Linear Regression | Baseline regression |
| Random Forest Regressor | Ensemble — tree based |
| Decision Tree Regressor | Tree based |
| K-Nearest Neighbors | Instance based |
The models were evaluated using standard regression metrics:
- R² Score — measures how well predictions fit actual AQI values
- MAE (Mean Absolute Error) — average prediction error magnitude
- RMSE (Root Mean Squared Error) — penalizes larger errors more heavily
Refer to the notebook for detailed metric comparisons and visualization plots.
git clone https://github.com/uddhav05-cyber/aqi-prediction-python.git
cd aqi-prediction-pythonpip install pandas numpy matplotlib seaborn scikit-learn jupyterjupyter notebook Predicting_Air_Quality_Index_using_Python.ipynbIn Jupyter: Kernel → Restart & Run All
Uddhav Bhople
BTech Computer Engineering | DY Patil University, Pune Software Engineering Student & Aspiring AI Engineer
This project is licensed under the MIT License.
If you found this project helpful, please consider giving it a ⭐