[ICML'26]AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

AutoMoT_v2_compressed.mp4

AutoMoT is an asyncronous VLA end-to-end autonomous driving agent accepted at ICML 2026.

Current release: Closed-loop inference on Bench2Drive (220 routes); model checkpoints and NuSync dataset are open-sourced. Training code coming soon — see TODO.

TODO List

Bench2Drive closed-loop inference (220 routes, CARLA 0.9.15)
Model checkpoint release (HuggingFace)
NuSync dataset release (HuggingFace)
Training code release
Action Refiner code release

Method Overview

AutoMoT uses an Asynchronous Mixture-of-Transformers design: a slow Understanding Expert (4B) performs low-frequency reasoning, while a fast Action Expert (1.6B) runs at high frequency to decode 3-second decisions and spatial-temporal waypoints via KV-cache bridging.

Repository Structure

Bench2Drive_opensource/
├── Automot/                          # AutoMoT model and agent utilities
│   ├── mot/
│   │   ├── modeling/
│   │   │   ├── automot/              # Core model: AutoMoT, configs, connectors
│   │   │   ├── bev_encoder/          # BEV encoder backbone 
│   │   │   ├── cache_utils/          # KV-cache utilities
│   │   │   └── qwen3/                # Qwen3 text backbone
│   │   ├── data/reasoning/           # Special token handling
│   │   └── evaluation/               # Inference engine (slow/fast KV-cache)
│   ├── team_code/                    # UKF, LiDAR preprocessing, prompt builders
│   └── checkpoints/                  # Model weights (downloaded separately)
│       ├── model.safetensors         # All weights: AutoMoT
│       ├── config.json               # Qwen3-VL model config
│       ├── tokenizer*.json           # Tokenizer files
│       ├── preprocessor_config.json  # Vision preprocessor
│       └── bev_config.json           # BEV encoder GlobalConfig
├── leaderboard/                      # Bench2Drive evaluation harness
│   ├── team_code/
│   │   ├── mot_b2d_agent.py          # Main CARLA agent entry point
│   │   ├── automot_utils.py          # Model loading + prompt utilities
│   │   └── bev_data_utils.py         # LiDAR → BEV histogram features
│   ├── data/bench2drive220/          # 220 route XML files
│   └── scripts/
│       └── run_evaluation_route.sh   # Route-by-route evaluation
├── eval_json/                        # Route JSON files for evaluation
│   ├── b2d_all_routes.json           # All 220 routes
│   ├── b2d_all_routes_split1.json    # Routes 1–110 (for multi-GPU)
│   ├── b2d_all_routes_split2.json    # Routes 111–220 (for multi-GPU)
│   └── b2d_all_routes_merged.json    # Route ID index (used by run script)
├── scenario_runner/                  # CARLA scenario execution

Environment Setup

1. CARLA 0.9.15

mkdir carla && cd carla
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/CARLA_0.9.15.tar.gz
tar -xvf CARLA_0.9.15.tar.gz
cd Import && wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/AdditionalMaps_0.9.15.tar.gz
cd .. && bash ImportAssets.sh
export CARLA_ROOT=/path/to/carla  # set to the directory containing CarlaUE4.sh

2. Create the `automot` environment

conda create -n automot python=3.10
conda activate automot

3. PyTorch

pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128 \
    --index-url https://download.pytorch.org/whl/cu128

4. Python dependencies

# Install all requirements
pip install -r requirements.txt

# CARLA Python API (Python 3.10, available on PyPI)
pip install carla==0.9.15

# flash-attn (requires torch to be installed first)
pip install flash-attn==2.8.3 --no-build-isolation

5. Environment variables

export CARLA_ROOT=/path/to/carla
export PYTHONPATH=$CARLA_ROOT/PythonAPI/carla:$PYTHONPATH

Model Weights

All weights are hosted at Oscar-Huang/AutoMoT.

File	Local destination	Description	Size
`model.safetensors`	`Automot/checkpoints/model.safetensors`	All model weights	~13 GB
`config.json`	`Automot/checkpoints/`	Qwen3-VL model config	< 1 MB
`tokenizer*.json`	`Automot/checkpoints/`	Tokenizer files	< 1 MB
`preprocessor_config.json`	`Automot/checkpoints/`	Vision preprocessor	< 1 MB
`bev_config.json`	`Automot/checkpoints/`	BEV encoder config	< 1 MB

huggingface-cli download Oscar-Huang/AutoMoT \
    --local-dir Automot/checkpoints \
    --repo-type model

Running Evaluation

Route-by-route evaluation

cd leaderboard/scripts
bash run_evaluation_route.sh

This script:

Runs all 220 routes sequentially, skipping already completed ones
Saves per-route JSON to leaderboard/scripts/v_2json_open/

Benchmark Results

Bench2Drive 220-route closed-loop evaluation (DS↑ / SR↑):

AutoMoT achieves DS=87.34 / SR=70.00

Citation

@article{huang2026automot,
  title   = {AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving},
  author  = {Wenhui Huang and Songyan Zhang and Qihang Huang and Zhidong Wang and Zhiqi Mao and Collister Chua and Zhan Chen and Long Chen and Chen Lv},
  journal = {arXiv preprint arXiv:2603.14851},
  year    = {2026},
  url     = {https://arxiv.org/abs/2603.14851}
}

@inproceedings{jia2024bench,
  title     = {Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
  author    = {Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
  booktitle = {NeurIPS 2024 Datasets and Benchmarks Track},
  year      = {2024}
}

Acknowledgements

We thank TransFuser++, SimLingo, and BAGEL for their open-source contributions, which this work builds upon.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Automot		Automot
assets		assets
eval_json		eval_json
leaderboard		leaderboard
scenario_runner		scenario_runner
tools		tools
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICML'26]AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

TODO List

Table of Contents

Method Overview

Repository Structure

Environment Setup

1. CARLA 0.9.15

2. Create the `automot` environment

3. PyTorch

4. Python dependencies

5. Environment variables

Model Weights

Running Evaluation

Route-by-route evaluation

Benchmark Results

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICML'26]AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

TODO List

Table of Contents

Method Overview

Repository Structure

Environment Setup

1. CARLA 0.9.15

2. Create the automot environment

3. PyTorch

4. Python dependencies

5. Environment variables

Model Weights

Running Evaluation

Route-by-route evaluation

Benchmark Results

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Create the `automot` environment

Packages