[ICML'26]AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving
AutoMoT_v2_compressed.mp4
AutoMoT is an asyncronous VLA end-to-end autonomous driving agent accepted at ICML 2026.
Current release: Closed-loop inference on Bench2Drive (220 routes); model checkpoints and NuSync dataset are open-sourced. Training code coming soon — see TODO.
- Bench2Drive closed-loop inference (220 routes, CARLA 0.9.15)
- Model checkpoint release (HuggingFace)
- NuSync dataset release (HuggingFace)
- Training code release
- Action Refiner code release
- Method Overview
- Repository Structure
- Environment Setup
- Model Weights
- Running Evaluation
- Benchmark Results
- TODO List
- Citation
AutoMoT uses an Asynchronous Mixture-of-Transformers design: a slow Understanding Expert (4B) performs low-frequency reasoning, while a fast Action Expert (1.6B) runs at high frequency to decode 3-second decisions and spatial-temporal waypoints via KV-cache bridging.
Bench2Drive_opensource/
├── Automot/ # AutoMoT model and agent utilities
│ ├── mot/
│ │ ├── modeling/
│ │ │ ├── automot/ # Core model: AutoMoT, configs, connectors
│ │ │ ├── bev_encoder/ # BEV encoder backbone
│ │ │ ├── cache_utils/ # KV-cache utilities
│ │ │ └── qwen3/ # Qwen3 text backbone
│ │ ├── data/reasoning/ # Special token handling
│ │ └── evaluation/ # Inference engine (slow/fast KV-cache)
│ ├── team_code/ # UKF, LiDAR preprocessing, prompt builders
│ └── checkpoints/ # Model weights (downloaded separately)
│ ├── model.safetensors # All weights: AutoMoT
│ ├── config.json # Qwen3-VL model config
│ ├── tokenizer*.json # Tokenizer files
│ ├── preprocessor_config.json # Vision preprocessor
│ └── bev_config.json # BEV encoder GlobalConfig
├── leaderboard/ # Bench2Drive evaluation harness
│ ├── team_code/
│ │ ├── mot_b2d_agent.py # Main CARLA agent entry point
│ │ ├── automot_utils.py # Model loading + prompt utilities
│ │ └── bev_data_utils.py # LiDAR → BEV histogram features
│ ├── data/bench2drive220/ # 220 route XML files
│ └── scripts/
│ └── run_evaluation_route.sh # Route-by-route evaluation
├── eval_json/ # Route JSON files for evaluation
│ ├── b2d_all_routes.json # All 220 routes
│ ├── b2d_all_routes_split1.json # Routes 1–110 (for multi-GPU)
│ ├── b2d_all_routes_split2.json # Routes 111–220 (for multi-GPU)
│ └── b2d_all_routes_merged.json # Route ID index (used by run script)
├── scenario_runner/ # CARLA scenario execution
mkdir carla && cd carla
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/CARLA_0.9.15.tar.gz
tar -xvf CARLA_0.9.15.tar.gz
cd Import && wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/AdditionalMaps_0.9.15.tar.gz
cd .. && bash ImportAssets.sh
export CARLA_ROOT=/path/to/carla # set to the directory containing CarlaUE4.shconda create -n automot python=3.10
conda activate automotpip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128 \
--index-url https://download.pytorch.org/whl/cu128# Install all requirements
pip install -r requirements.txt
# CARLA Python API (Python 3.10, available on PyPI)
pip install carla==0.9.15
# flash-attn (requires torch to be installed first)
pip install flash-attn==2.8.3 --no-build-isolationexport CARLA_ROOT=/path/to/carla
export PYTHONPATH=$CARLA_ROOT/PythonAPI/carla:$PYTHONPATHAll weights are hosted at Oscar-Huang/AutoMoT.
| File | Local destination | Description | Size |
|---|---|---|---|
model.safetensors |
Automot/checkpoints/model.safetensors |
All model weights | ~13 GB |
config.json |
Automot/checkpoints/ |
Qwen3-VL model config | < 1 MB |
tokenizer*.json |
Automot/checkpoints/ |
Tokenizer files | < 1 MB |
preprocessor_config.json |
Automot/checkpoints/ |
Vision preprocessor | < 1 MB |
bev_config.json |
Automot/checkpoints/ |
BEV encoder config | < 1 MB |
huggingface-cli download Oscar-Huang/AutoMoT \
--local-dir Automot/checkpoints \
--repo-type modelcd leaderboard/scripts
bash run_evaluation_route.shThis script:
- Runs all 220 routes sequentially, skipping already completed ones
- Saves per-route JSON to
leaderboard/scripts/v_2json_open/
Bench2Drive 220-route closed-loop evaluation (DS↑ / SR↑):
AutoMoT achieves DS=87.34 / SR=70.00
@article{huang2026automot,
title = {AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving},
author = {Wenhui Huang and Songyan Zhang and Qihang Huang and Zhidong Wang and Zhiqi Mao and Collister Chua and Zhan Chen and Long Chen and Chen Lv},
journal = {arXiv preprint arXiv:2603.14851},
year = {2026},
url = {https://arxiv.org/abs/2603.14851}
}
@inproceedings{jia2024bench,
title = {Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
author = {Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
booktitle = {NeurIPS 2024 Datasets and Benchmarks Track},
year = {2024}
}We thank TransFuser++, SimLingo, and BAGEL for their open-source contributions, which this work builds upon.
