Skip to content

OscarHuangWind/AutoMoT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICML'26]AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

ICML

arXiv   Project Page   Weights   Datasets

AutoMoT_v2_compressed.mp4

AutoMoT is an asyncronous VLA end-to-end autonomous driving agent accepted at ICML 2026.

Current release: Closed-loop inference on Bench2Drive (220 routes); model checkpoints and NuSync dataset are open-sourced. Training code coming soon — see TODO.


TODO List

  • Bench2Drive closed-loop inference (220 routes, CARLA 0.9.15)
  • Model checkpoint release (HuggingFace)
  • NuSync dataset release (HuggingFace)
  • Training code release
  • Action Refiner code release

Table of Contents

  1. Method Overview
  2. Repository Structure
  3. Environment Setup
  4. Model Weights
  5. Running Evaluation
  6. Benchmark Results
  7. TODO List
  8. Citation

Method Overview

AutoMoT uses an Asynchronous Mixture-of-Transformers design: a slow Understanding Expert (4B) performs low-frequency reasoning, while a fast Action Expert (1.6B) runs at high frequency to decode 3-second decisions and spatial-temporal waypoints via KV-cache bridging.


Repository Structure

Bench2Drive_opensource/
├── Automot/                          # AutoMoT model and agent utilities
│   ├── mot/
│   │   ├── modeling/
│   │   │   ├── automot/              # Core model: AutoMoT, configs, connectors
│   │   │   ├── bev_encoder/          # BEV encoder backbone 
│   │   │   ├── cache_utils/          # KV-cache utilities
│   │   │   └── qwen3/                # Qwen3 text backbone
│   │   ├── data/reasoning/           # Special token handling
│   │   └── evaluation/               # Inference engine (slow/fast KV-cache)
│   ├── team_code/                    # UKF, LiDAR preprocessing, prompt builders
│   └── checkpoints/                  # Model weights (downloaded separately)
│       ├── model.safetensors         # All weights: AutoMoT
│       ├── config.json               # Qwen3-VL model config
│       ├── tokenizer*.json           # Tokenizer files
│       ├── preprocessor_config.json  # Vision preprocessor
│       └── bev_config.json           # BEV encoder GlobalConfig
├── leaderboard/                      # Bench2Drive evaluation harness
│   ├── team_code/
│   │   ├── mot_b2d_agent.py          # Main CARLA agent entry point
│   │   ├── automot_utils.py          # Model loading + prompt utilities
│   │   └── bev_data_utils.py         # LiDAR → BEV histogram features
│   ├── data/bench2drive220/          # 220 route XML files
│   └── scripts/
│       └── run_evaluation_route.sh   # Route-by-route evaluation
├── eval_json/                        # Route JSON files for evaluation
│   ├── b2d_all_routes.json           # All 220 routes
│   ├── b2d_all_routes_split1.json    # Routes 1–110 (for multi-GPU)
│   ├── b2d_all_routes_split2.json    # Routes 111–220 (for multi-GPU)
│   └── b2d_all_routes_merged.json    # Route ID index (used by run script)
├── scenario_runner/                  # CARLA scenario execution

Environment Setup

1. CARLA 0.9.15

mkdir carla && cd carla
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/CARLA_0.9.15.tar.gz
tar -xvf CARLA_0.9.15.tar.gz
cd Import && wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/AdditionalMaps_0.9.15.tar.gz
cd .. && bash ImportAssets.sh
export CARLA_ROOT=/path/to/carla  # set to the directory containing CarlaUE4.sh

2. Create the automot environment

conda create -n automot python=3.10
conda activate automot

3. PyTorch

pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128 \
    --index-url https://download.pytorch.org/whl/cu128

4. Python dependencies

# Install all requirements
pip install -r requirements.txt

# CARLA Python API (Python 3.10, available on PyPI)
pip install carla==0.9.15

# flash-attn (requires torch to be installed first)
pip install flash-attn==2.8.3 --no-build-isolation

5. Environment variables

export CARLA_ROOT=/path/to/carla
export PYTHONPATH=$CARLA_ROOT/PythonAPI/carla:$PYTHONPATH

Model Weights

Model on HuggingFace

All weights are hosted at Oscar-Huang/AutoMoT.

File Local destination Description Size
model.safetensors Automot/checkpoints/model.safetensors All model weights ~13 GB
config.json Automot/checkpoints/ Qwen3-VL model config < 1 MB
tokenizer*.json Automot/checkpoints/ Tokenizer files < 1 MB
preprocessor_config.json Automot/checkpoints/ Vision preprocessor < 1 MB
bev_config.json Automot/checkpoints/ BEV encoder config < 1 MB
huggingface-cli download Oscar-Huang/AutoMoT \
    --local-dir Automot/checkpoints \
    --repo-type model

Running Evaluation

Route-by-route evaluation

cd leaderboard/scripts
bash run_evaluation_route.sh

This script:

  • Runs all 220 routes sequentially, skipping already completed ones
  • Saves per-route JSON to leaderboard/scripts/v_2json_open/

Benchmark Results

Bench2Drive 220-route closed-loop evaluation (DS↑ / SR↑):

Bench2Drive Results

AutoMoT achieves DS=87.34 / SR=70.00


Citation

@article{huang2026automot,
  title   = {AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving},
  author  = {Wenhui Huang and Songyan Zhang and Qihang Huang and Zhidong Wang and Zhiqi Mao and Collister Chua and Zhan Chen and Long Chen and Chen Lv},
  journal = {arXiv preprint arXiv:2603.14851},
  year    = {2026},
  url     = {https://arxiv.org/abs/2603.14851}
}

@inproceedings{jia2024bench,
  title     = {Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
  author    = {Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
  booktitle = {NeurIPS 2024 Datasets and Benchmarks Track},
  year      = {2024}
}

Acknowledgements

We thank TransFuser++, SimLingo, and BAGEL for their open-source contributions, which this work builds upon.

About

[ICML'26] This is the official repository of AutoMoT, an asynchronous VLA as E2E Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors