Skip to content

amangalampalli/agentic-traffic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title Agentic Traffic
emoji 🏢
colorFrom green
colorTo purple
sdk docker
pinned false
short_description Agentic AI to control traffic lights
app_port 7860

traffic-llm

CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.

Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic

OpenEnv UI

For the deployed OpenEnv web interface:

  • Click Reset before using Step.
  • Leave Use Llm unchecked for the fast, stable DQN-only path.
  • Use District Actions = {} for a valid no-op step payload.
  • Only enable Use Llm when you explicitly want district-level LLM guidance on top of the DQN executor.

Training

The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:

python3 -m training.train_local_policy train

That trains against data/generated, uses data/splits, writes checkpoints to artifacts/dqn_shared, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows tqdm progress bars, and now validates plus checkpoints every 40 updates by default.

For a broader but still manageable validation pass:

python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7

That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.

Phase-3-style full training with the same 40-update eval/checkpoint cadence:

python3 -m training.train_local_policy train \
  --max-train-cities 70 \
  --max-val-cities 3 \
  --val-scenarios-per-city 7 \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput

Useful ablations:

python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput

For a fast phase-1 overfit run on one fixed world:

python3 -m training.train_local_policy train \
  --total-updates 25 \
  --train-city-id city_0072 \
  --train-scenario-name normal \
  --overfit-val-on-train-scenario \
  --fast-overfit \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput

To create or refresh dataset splits:

python3 -m training.train_local_policy make-splits

To evaluate the best checkpoint:

python3 -m training.train_local_policy evaluate \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --split val

To evaluate a heuristic baseline directly:

python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val

TensorBoard

TensorBoard logs are written to artifacts/dqn_shared/tensorboard by default.

tensorboard --logdir artifacts/dqn_shared/tensorboard

District LLM

The district LLM stack lives under district_llm/. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.

Generate district-LLM data from a learned checkpoint:

python3 -m district_llm.generate_dataset \
  --controller rl_checkpoint \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --episodes 100 \
  --decision-interval 10 \
  --use-checkpoint-env-config \
  --output data/district_llm_train.jsonl

Generate from fixed or heuristic baselines:

python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl

Train a first-pass district model with Unsloth/QLoRA:

python3 -m training.train_district_llm \
  --dataset data/district_llm_train.jsonl \
  --output-dir artifacts/district_llm_qwen \
  --model-name Qwen/Qwen2.5-7B-Instruct \
  --load-in-4bit \
  --lora-rank 16 \
  --max-seq-length 1024 \
  --max-steps 1000

Run single-sample inference:

python3 -m district_llm.inference \
  --model artifacts/district_llm_qwen \
  --city-id city_0006 \
  --scenario-name accident \
  --district-id d_00

Run the OpenEnv-compatible district wrapper on top of the current DQN stack:

uvicorn openenv_app.app:app --reload

Algorithm

  • Training algorithm: parameter-shared dueling Double DQN.
  • Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
  • Return target: n-step bootstrap target with target-network updates.
  • Execution: all controllable intersections act simultaneously every RL decision interval.
  • Action space: 0 = hold current phase, 1 = switch to next green phase.
  • Safety: min_green_time is enforced in the environment and exposed through action masking.

Policy architecture modes:

  • multi_head: shared trunk with district-type-specific Q heads.
  • single_head: one shared Q head for all intersections, with district type removed from the observation.
  • single_head_with_district_feature: one shared Q head for all intersections, with district type left in the observation as an explicit feature.

Reward variants:

  • current: backward-compatible waiting and queue penalty.
  • normalized_wait_queue: normalized queue and waiting reduction reward.
  • wait_queue_throughput: normalized queue/wait reduction plus throughput bonus and imbalance penalty.

Smoke Test

To sanity-check one generated scenario with the real CityFlow environment:

python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random

Project layout

  • agents/: heuristic local policies and simple baselines.
  • env/: CityFlow environment, topology parsing, observation building, and reward logic.
  • training/: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
  • data/: generated synthetic cities, split files, and dataset generation utilities.
  • scripts/: utility scripts, including the CityFlow smoke test.
  • third_party/: vendored dependencies, including CityFlow source.

Notes

  • The generated dataset is assumed to already exist under data/generated.
  • District membership comes from district_map.json.
  • District types come from metadata.json.
  • Runtime training and evaluation require the cityflow Python module to be installed in the active environment.

About

Using RL to build simulated world models to optimize traffic flow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors