Skip to content

miiiereDev/manila-transit-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manila Transit Analytics

Understanding the Intersection of Time, Weather, and Public Transport Reliability in Metro Manila.


1. Research Overview

This repository houses the research data, pipeline scripts, and interactive tools for the study investigating public transport reliability in Metro Manila. The research evaluates how temporal commuter surges (peak hours) and environmental disruptions (heavy rain) impact transit wait times and delays across different commuting systems.

Key Objectives

  • Intermodal Comparison: Directly comparing road-based transit (EDSA Carousel busway) against rail-based transit (MRT-3, LRT-1, and LRT-2) to assess which systems show greater climate and surge resilience.
  • Spatial Bottleneck Analysis: Mapping and identifying the worst-performing station stops across the public transport network.
  • Replication and Open Science: Providing a documented, end-to-end reproducible pipeline from raw data generation to statistical results and interactive dashboard visualizers.

2. Core Research Questions & Formulation

The analysis addresses four primary Research Questions:

RQ1: Overall Line Performance

Evaluating the baseline delay between scheduled intervals and actual wait times across transit lines.

  • Metric - Absolute Delay: $$\text{Delay} = \text{Actual Wait Time} - \text{Scheduled Interval}$$

RQ2: Peak Hour Impact

Quantifying the average wait time increase during rush hours compared to off-peak periods.

  • Metric - Peak Surge Ratio: $$\text{Delay Ratio} = \frac{\text{Actual Wait Time}}{\text{Scheduled Interval}}$$
    • Peak Hours defined as: Peak Morning and Peak Evening.
    • Off-Peak Hours defined as: Mid-Day and Late Night.

RQ3: Weather Vulnerability

Measuring how precipitation (Heavy Rain vs. Clear/Cloudy) affects wait times across both transit modes.

  • Metric - Percentage Increase: $$\text{Percentage Increase} = \left( \frac{\text{Mean Wait (Disrupted)} - \text{Mean Wait (Baseline)}}{\text{Mean Wait (Baseline)}} \right) \times 100%$$

RQ4: Worst Stations

Identifying the top five station bottlenecks by absolute delay across the network.


3. Technology Stack

The project integrates data processing, analytical modeling, and visual front-end components:

Data Processing & Analytics

  • Python: Core programming language.
  • Pandas: Used for data loading, deduplication, median imputation, outlier filtering, and database aggregations.
  • Numpy: Standard numerical calculations and generating Gamma/Gaussian distributions for delay simulations.
  • Jupyter: Interactive replication notebook environments.

Data Visualization

  • Matplotlib: Generating high-fidelity baseline plots.
  • Seaborn: Generating publication-ready grouped statistical plots with custom styling.

Interactive Dashboard

  • HTML5: Semantic document layout.
  • Vanilla CSS3: Dark-themed layout incorporating glassmorphism (backdrop-filter: blur()), custom HSL-tailored grid borders, hover transitions, and glow-orbs.
  • Javascript (ES6): Handles interactive tab-swapping and integrates a custom client-side CSV parser to dynamically read the cleaned data.
  • Python Threaded Server: Uses Python's standard http.server.SimpleHTTPRequestHandler combined with threading to host the dashboard locally. It disables browser caching by overriding response headers to ensure live visualization updates.

4. Repository Structure

Below is an overview of the directory organization. All file links below point to local source files in this repository:


5. Usage Pipeline & Execution

To replicate the study or explore the interactive findings, execute the steps sequentially from the root directory:

Step 1: Clean Raw Data

Applies deduplication, normalizes categorical casing variations, performs median-based imputation for missing wait times, and removes Gaussian outliers.

python scripts/data_cleaning.py

Step 2: Run Exploratory Analysis

Calculates statistics for overall performance, peak surges, and weather resilience, then updates visual charts inside the output folder.

python scripts/data_analysis.py

Step 3: Launch Companion Dashboard

Starts a multi-threaded local web server and launches the GUI dashboard in your default browser.

python main.py

6. Research Findings & Insights

The data analysis pipeline reveals several intermodal findings:

  1. The Intermodal Weather Resiliency Gap: Train transit networks show exceptional resilience to heavy rainfall (experiencing average wait time increases of only 4% to 20%). In contrast, road-based transit (EDSA Carousel) suffers an 88.4% wait-time surge due to exposure to general traffic congestion and localized road flooding.
  2. Peak-Hour Saturation: Despite high weather resilience, the rail transit systems experience significant overcrowding during commuter peaks, with delays increasing by 115% to 120%. By comparison, the EDSA Carousel has an 85.1% increase.
  3. Data Casing Correction: Analyzing station data highlighted that string mutations (e.g. mAGALLANES and MAGALLANES) will skew averaging calculations if not normalized. Standardizing station casing on ingestion merges these duplicate bins, providing an accurate spatial analysis of the worst-performing stations.

About

This repository contains research data and analysis scripts for Manila's transit systems. All data and outputs are public for research purposes.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors