traffic-llm

title	Agentic Traffic
emoji	🏢
colorFrom	green
colorTo	purple
sdk	docker
pinned	false
short_description	Agentic AI to control traffic lights
app_port	7860

traffic-llm

CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.

Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic

OpenEnv UI

For the deployed OpenEnv web interface:

Click Reset before using Step.
Leave Use Llm unchecked for the fast, stable DQN-only path.
Use District Actions = {} for a valid no-op step payload.
Only enable Use Llm when you explicitly want district-level LLM guidance on top of the DQN executor.

Training

The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:

python3 -m training.train_local_policy train

That trains against data/generated, uses data/splits, writes checkpoints to artifacts/dqn_shared, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows tqdm progress bars, and now validates plus checkpoints every 40 updates by default.

For a broader but still manageable validation pass:

python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7

That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.

Phase-3-style full training with the same 40-update eval/checkpoint cadence:

python3 -m training.train_local_policy train \
  --max-train-cities 70 \
  --max-val-cities 3 \
  --val-scenarios-per-city 7 \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput

Useful ablations:

python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput

For a fast phase-1 overfit run on one fixed world:

python3 -m training.train_local_policy train \
  --total-updates 25 \
  --train-city-id city_0072 \
  --train-scenario-name normal \
  --overfit-val-on-train-scenario \
  --fast-overfit \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput

To create or refresh dataset splits:

python3 -m training.train_local_policy make-splits

To evaluate the best checkpoint:

python3 -m training.train_local_policy evaluate \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --split val

To evaluate a heuristic baseline directly:

python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val

TensorBoard

TensorBoard logs are written to artifacts/dqn_shared/tensorboard by default.

tensorboard --logdir artifacts/dqn_shared/tensorboard

District LLM

The district LLM stack lives under district_llm/. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.

Generate district-LLM data from a learned checkpoint:

python3 -m district_llm.generate_dataset \
  --controller rl_checkpoint \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --episodes 100 \
  --decision-interval 10 \
  --use-checkpoint-env-config \
  --output data/district_llm_train.jsonl

Generate from fixed or heuristic baselines:

python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl

Train a first-pass district model with Unsloth/QLoRA:

python3 -m training.train_district_llm \
  --dataset data/district_llm_train.jsonl \
  --output-dir artifacts/district_llm_qwen \
  --model-name Qwen/Qwen2.5-7B-Instruct \
  --load-in-4bit \
  --lora-rank 16 \
  --max-seq-length 1024 \
  --max-steps 1000

Run single-sample inference:

python3 -m district_llm.inference \
  --model artifacts/district_llm_qwen \
  --city-id city_0006 \
  --scenario-name accident \
  --district-id d_00

Run the OpenEnv-compatible district wrapper on top of the current DQN stack:

uvicorn openenv_app.app:app --reload

Algorithm

Training algorithm: parameter-shared dueling Double DQN.
Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
Return target: n-step bootstrap target with target-network updates.
Execution: all controllable intersections act simultaneously every RL decision interval.
Action space: 0 = hold current phase, 1 = switch to next green phase.
Safety: min_green_time is enforced in the environment and exposed through action masking.

Policy architecture modes:

multi_head: shared trunk with district-type-specific Q heads.
single_head: one shared Q head for all intersections, with district type removed from the observation.
single_head_with_district_feature: one shared Q head for all intersections, with district type left in the observation as an explicit feature.

Reward variants:

current: backward-compatible waiting and queue penalty.
normalized_wait_queue: normalized queue and waiting reduction reward.
wait_queue_throughput: normalized queue/wait reduction plus throughput bonus and imbalance penalty.

Smoke Test

To sanity-check one generated scenario with the real CityFlow environment:

python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random

Project layout

agents/: heuristic local policies and simple baselines.
env/: CityFlow environment, topology parsing, observation building, and reward logic.
training/: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
data/: generated synthetic cities, split files, and dataset generation utilities.
scripts/: utility scripts, including the CityFlow smoke test.
third_party/: vendored dependencies, including CityFlow source.

Notes

The generated dataset is assumed to already exist under data/generated.
District membership comes from district_map.json.
District types come from metadata.json.
Runtime training and evaluation require the cityflow Python module to be installed in the active environment.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
agents		agents
configs		configs
dashboard		dashboard
data		data
district_llm		district_llm
env		env
notebooks		notebooks
openenv_app		openenv_app
results/anova		results/anova
scripts		scripts
server		server
third_party		third_party
training		training
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.openenv-api		Dockerfile.openenv-api
Dockerfile.visualizer		Dockerfile.visualizer
README.md		README.md
__init__.py		__init__.py
client.py		client.py
environment.yml		environment.yml
models.py		models.py
openenv.yaml		openenv.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test.py		test.py
testing1		testing1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

traffic-llm

OpenEnv UI

Training

TensorBoard

District LLM

Algorithm

Smoke Test

Project layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

traffic-llm

OpenEnv UI

Training

TensorBoard

District LLM

Algorithm

Smoke Test

Project layout

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages