Manila Transit Analytics

Understanding the Intersection of Time, Weather, and Public Transport Reliability in Metro Manila.

1. Research Overview

This repository houses the research data, pipeline scripts, and interactive tools for the study investigating public transport reliability in Metro Manila. The research evaluates how temporal commuter surges (peak hours) and environmental disruptions (heavy rain) impact transit wait times and delays across different commuting systems.

Key Objectives

Intermodal Comparison: Directly comparing road-based transit (EDSA Carousel busway) against rail-based transit (MRT-3, LRT-1, and LRT-2) to assess which systems show greater climate and surge resilience.
Spatial Bottleneck Analysis: Mapping and identifying the worst-performing station stops across the public transport network.
Replication and Open Science: Providing a documented, end-to-end reproducible pipeline from raw data generation to statistical results and interactive dashboard visualizers.

2. Core Research Questions & Formulation

The analysis addresses four primary Research Questions:

RQ1: Overall Line Performance

Evaluating the baseline delay between scheduled intervals and actual wait times across transit lines.

Metric - Absolute Delay: $$\text{Delay} = \text{Actual Wait Time} - \text{Scheduled Interval}$$

RQ2: Peak Hour Impact

Quantifying the average wait time increase during rush hours compared to off-peak periods.

Metric - Peak Surge Ratio: $$\text{Delay Ratio} = \frac{\text{Actual Wait Time}}{\text{Scheduled Interval}}$$
- Peak Hours defined as: Peak Morning and Peak Evening.
- Off-Peak Hours defined as: Mid-Day and Late Night.

RQ3: Weather Vulnerability

Measuring how precipitation (Heavy Rain vs. Clear/Cloudy) affects wait times across both transit modes.

Metric - Percentage Increase: $$\text{Percentage Increase} = \left( \frac{\text{Mean Wait (Disrupted)} - \text{Mean Wait (Baseline)}}{\text{Mean Wait (Baseline)}} \right) \times 100%$$

RQ4: Worst Stations

Identifying the top five station bottlenecks by absolute delay across the network.

3. Technology Stack

The project integrates data processing, analytical modeling, and visual front-end components:

Data Processing & Analytics

Python: Core programming language.
Pandas: Used for data loading, deduplication, median imputation, outlier filtering, and database aggregations.
Numpy: Standard numerical calculations and generating Gamma/Gaussian distributions for delay simulations.
Jupyter: Interactive replication notebook environments.

Data Visualization

Matplotlib: Generating high-fidelity baseline plots.
Seaborn: Generating publication-ready grouped statistical plots with custom styling.

Interactive Dashboard

HTML5: Semantic document layout.
Vanilla CSS3: Dark-themed layout incorporating glassmorphism (backdrop-filter: blur()), custom HSL-tailored grid borders, hover transitions, and glow-orbs.
Javascript (ES6): Handles interactive tab-swapping and integrates a custom client-side CSV parser to dynamically read the cleaned data.
Python Threaded Server: Uses Python's standard http.server.SimpleHTTPRequestHandler combined with threading to host the dashboard locally. It disables browser caching by overriding response headers to ensure live visualization updates.

4. Repository Structure

Below is an overview of the directory organization. All file links below point to local source files in this repository:

data/
- data/1_raw/: Houses the raw simulated CSV file manila_transit_raw.csv.
- data/2_cleaned/: Houses the structured, cleaned CSV output manila_transit_cleaned.csv.
docs/: Contains detailed methodologies for the project stages:
- DATA_SIMULATION.md: Logic behind delay calculations and dirty noise injection.
- DATA_CLEANING.md: Sequential processing rules (deduplication, imputations, and outlier filtering).
- DATA_ANALYSIS.md: The analysis plan, hypothesis descriptions, and metric formulas.
notebooks/
- Manila_Transit_Documentation_and_Replication_Guide.ipynb: The comprehensive Jupyter replication notebook containing step-by-step documentation, equations, code execution blocks, and inline visualizations.
output/
- output/figures/: Output directory where figures generated by the analysis script are stored (e.g., average wait times, peak ratios, weather impact, and worst stations).
scripts/: Core executable Python code:
- data_simulation.py: Geographic mapping and random delay distribution generator.
- data_cleaning.py: Cleans and prepares the raw data.
- data_analysis.py: Computes findings and outputs charts.
src/
- src/ui/: Source files for the local dashboard including index.html, style.css, and app.js.
main.py: Root executable Python script to launch the local web server and view the dashboard.
requirements.txt: List of Python library dependencies.

5. Usage Pipeline & Execution

To replicate the study or explore the interactive findings, execute the steps sequentially from the root directory:

Step 1: Clean Raw Data

Applies deduplication, normalizes categorical casing variations, performs median-based imputation for missing wait times, and removes Gaussian outliers.

python scripts/data_cleaning.py

Step 2: Run Exploratory Analysis

Calculates statistics for overall performance, peak surges, and weather resilience, then updates visual charts inside the output folder.

python scripts/data_analysis.py

Step 3: Launch Companion Dashboard

Starts a multi-threaded local web server and launches the GUI dashboard in your default browser.

python main.py

6. Research Findings & Insights

The data analysis pipeline reveals several intermodal findings:

The Intermodal Weather Resiliency Gap: Train transit networks show exceptional resilience to heavy rainfall (experiencing average wait time increases of only 4% to 20%). In contrast, road-based transit (EDSA Carousel) suffers an 88.4% wait-time surge due to exposure to general traffic congestion and localized road flooding.
Peak-Hour Saturation: Despite high weather resilience, the rail transit systems experience significant overcrowding during commuter peaks, with delays increasing by 115% to 120%. By comparison, the EDSA Carousel has an 85.1% increase.
Data Casing Correction: Analyzing station data highlighted that string mutations (e.g. mAGALLANES and MAGALLANES) will skew averaging calculations if not normalized. Standardizing station casing on ingestion merges these duplicate bins, providing an accurate spatial analysis of the worst-performing stations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manila Transit Analytics

1. Research Overview

Key Objectives

2. Core Research Questions & Formulation

RQ1: Overall Line Performance

RQ2: Peak Hour Impact

RQ3: Weather Vulnerability

RQ4: Worst Stations

3. Technology Stack

Data Processing & Analytics

Data Visualization

Interactive Dashboard

4. Repository Structure

5. Usage Pipeline & Execution

Step 1: Clean Raw Data

Step 2: Run Exploratory Analysis

Step 3: Launch Companion Dashboard

6. Research Findings & Insights

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
docs		docs
notebooks		notebooks
output/figures		output/figures
scripts		scripts
src/ui		src/ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Manila Transit Analytics

1. Research Overview

Key Objectives

2. Core Research Questions & Formulation

RQ1: Overall Line Performance

RQ2: Peak Hour Impact

RQ3: Weather Vulnerability

RQ4: Worst Stations

3. Technology Stack

Data Processing & Analytics

Data Visualization

Interactive Dashboard

4. Repository Structure

5. Usage Pipeline & Execution

Step 1: Clean Raw Data

Step 2: Run Exploratory Analysis

Step 3: Launch Companion Dashboard

6. Research Findings & Insights

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages