Understanding the Intersection of Time, Weather, and Public Transport Reliability in Metro Manila.
This repository houses the research data, pipeline scripts, and interactive tools for the study investigating public transport reliability in Metro Manila. The research evaluates how temporal commuter surges (peak hours) and environmental disruptions (heavy rain) impact transit wait times and delays across different commuting systems.
- Intermodal Comparison: Directly comparing road-based transit (EDSA Carousel busway) against rail-based transit (MRT-3, LRT-1, and LRT-2) to assess which systems show greater climate and surge resilience.
- Spatial Bottleneck Analysis: Mapping and identifying the worst-performing station stops across the public transport network.
- Replication and Open Science: Providing a documented, end-to-end reproducible pipeline from raw data generation to statistical results and interactive dashboard visualizers.
The analysis addresses four primary Research Questions:
Evaluating the baseline delay between scheduled intervals and actual wait times across transit lines.
-
Metric - Absolute Delay:
$$\text{Delay} = \text{Actual Wait Time} - \text{Scheduled Interval}$$
Quantifying the average wait time increase during rush hours compared to off-peak periods.
-
Metric - Peak Surge Ratio:
$$\text{Delay Ratio} = \frac{\text{Actual Wait Time}}{\text{Scheduled Interval}}$$ - Peak Hours defined as: Peak Morning and Peak Evening.
- Off-Peak Hours defined as: Mid-Day and Late Night.
Measuring how precipitation (Heavy Rain vs. Clear/Cloudy) affects wait times across both transit modes.
-
Metric - Percentage Increase:
$$\text{Percentage Increase} = \left( \frac{\text{Mean Wait (Disrupted)} - \text{Mean Wait (Baseline)}}{\text{Mean Wait (Baseline)}} \right) \times 100%$$
Identifying the top five station bottlenecks by absolute delay across the network.
The project integrates data processing, analytical modeling, and visual front-end components:
- Python: Core programming language.
- Pandas: Used for data loading, deduplication, median imputation, outlier filtering, and database aggregations.
- Numpy: Standard numerical calculations and generating Gamma/Gaussian distributions for delay simulations.
- Jupyter: Interactive replication notebook environments.
- Matplotlib: Generating high-fidelity baseline plots.
- Seaborn: Generating publication-ready grouped statistical plots with custom styling.
- HTML5: Semantic document layout.
- Vanilla CSS3: Dark-themed layout incorporating glassmorphism (
backdrop-filter: blur()), custom HSL-tailored grid borders, hover transitions, and glow-orbs. - Javascript (ES6): Handles interactive tab-swapping and integrates a custom client-side CSV parser to dynamically read the cleaned data.
- Python Threaded Server: Uses Python's standard
http.server.SimpleHTTPRequestHandlercombined withthreadingto host the dashboard locally. It disables browser caching by overriding response headers to ensure live visualization updates.
Below is an overview of the directory organization. All file links below point to local source files in this repository:
- data/
- data/1_raw/: Houses the raw simulated CSV file manila_transit_raw.csv.
- data/2_cleaned/: Houses the structured, cleaned CSV output manila_transit_cleaned.csv.
- docs/: Contains detailed methodologies for the project stages:
- DATA_SIMULATION.md: Logic behind delay calculations and dirty noise injection.
- DATA_CLEANING.md: Sequential processing rules (deduplication, imputations, and outlier filtering).
- DATA_ANALYSIS.md: The analysis plan, hypothesis descriptions, and metric formulas.
- notebooks/
- Manila_Transit_Documentation_and_Replication_Guide.ipynb: The comprehensive Jupyter replication notebook containing step-by-step documentation, equations, code execution blocks, and inline visualizations.
- output/
- output/figures/: Output directory where figures generated by the analysis script are stored (e.g., average wait times, peak ratios, weather impact, and worst stations).
- scripts/: Core executable Python code:
- data_simulation.py: Geographic mapping and random delay distribution generator.
- data_cleaning.py: Cleans and prepares the raw data.
- data_analysis.py: Computes findings and outputs charts.
- src/
- src/ui/: Source files for the local dashboard including index.html, style.css, and app.js.
- main.py: Root executable Python script to launch the local web server and view the dashboard.
- requirements.txt: List of Python library dependencies.
To replicate the study or explore the interactive findings, execute the steps sequentially from the root directory:
Applies deduplication, normalizes categorical casing variations, performs median-based imputation for missing wait times, and removes Gaussian outliers.
python scripts/data_cleaning.pyCalculates statistics for overall performance, peak surges, and weather resilience, then updates visual charts inside the output folder.
python scripts/data_analysis.pyStarts a multi-threaded local web server and launches the GUI dashboard in your default browser.
python main.pyThe data analysis pipeline reveals several intermodal findings:
- The Intermodal Weather Resiliency Gap: Train transit networks show exceptional resilience to heavy rainfall (experiencing average wait time increases of only 4% to 20%). In contrast, road-based transit (EDSA Carousel) suffers an 88.4% wait-time surge due to exposure to general traffic congestion and localized road flooding.
- Peak-Hour Saturation: Despite high weather resilience, the rail transit systems experience significant overcrowding during commuter peaks, with delays increasing by 115% to 120%. By comparison, the EDSA Carousel has an 85.1% increase.
- Data Casing Correction: Analyzing station data highlighted that string mutations (e.g.
mAGALLANESandMAGALLANES) will skew averaging calculations if not normalized. Standardizing station casing on ingestion merges these duplicate bins, providing an accurate spatial analysis of the worst-performing stations.