Skip to content

JerryPeng0201/NEBULA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

90 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

NEBULA: A Unified Ecosystem for Vision-Language-Action Agent Evaluation

Paper Homepage HuggingFace Leaderboard

Alt text

NEBULA is a unified ecosystem for evaluating Vision-Language-Action (VLA) agents in single-arm robotic manipulation. It is designed to move beyond coarse end-task success metrics by offering fine-grained capability tests for skill diagnosis and systematic stress tests for robustness evaluation. NEBULA promotes reproducible research through standardized APIs, structured multi-modal data, and a carefully curated task suite that spans perception, control, language, spatial reasoning, and dynamic adaptation. By focusing on what agents can do, and when they failโ€”NEBULA offers a principled foundation for developing robust and general-purpose embodied agents.

This is the official repository of NEBULA, featuring:

  • ๐Ÿ—‚๏ธ A unified data platform with standardized APIs for multi-modal robotic manipulation
  • ๐Ÿ“ A dual-axis evaluation framework combining capability testing and stress-based analysis
  • ๐Ÿค– Comprehensive data collection protocols, including both scripted and human demonstrations
  • ๐Ÿงญ Reproducible baselines across vision, language, and control tasks

๐Ÿ“ฐ News

  • [Oct 15, 2025] Our paper has been uploaded to arXiv

  • [Oct 15, 2025] NEBULA-ฮฑ v1.0 has released

โณ Coming Soon

The following features have been completed and are currently being prepared for public release:

  • Dataset Adapters for External Sources: Plug-and-play converters for LeRobot, Mobile-ALOHA, RobotNet, and more.

  • NEBULA Native Dataset on HuggingFace: Fully structured NEBULA-format episodes hosted via HuggingFace Datasets.

  • Finetuned Checkpoints of Baseline Models: Finetuned weights for all official baselines to support plug-in evaluation and transfer.

๐Ÿšช Portals โ€“ Quick Access

Access key components of the NEBULA ecosystem below:

  • ๐Ÿ—‚๏ธ Unified Data Platform
    Explore the structure, annotation format, and access APIs of our multi-modal dataset, including episode hierarchy, sensor modalities, and supported robots.

  • ๐Ÿ“ Dual-Axis Evaluation Framework
    Learn about our capability tests, stress tests, evaluation metrics, and the one-line command to launch fully automated benchmarking.

  • ๐Ÿค– Motion Planning Module
    Understand how motion planning is used for data generation in structured or static scenes, and the constraints involved in planned trajectories.

  • ๐ŸŽฎ Teleoperation Module
    See how human demonstrations are collected via teleoperation to capture natural, adaptive behaviors in dynamic and complex scenes.

  • ๐ŸŽฏ Baseline Adapter
    Access pre-configured baseline models with standardized training / fine-tuning, evaluation, and visualization scripts.

๐Ÿ“ฆ Installation

Follow the steps below to set up NEBULA on your local machine. The installation process supports both local development and benchmark evaluation.

Requirements

  • Python 3.10 or higher

  • pip>=21.0

  • CUDA==12.4

Step 1๏ธโƒฃ: Clone the Repository

git clone https://github.com/JerryPeng0201/NEBULA-Alpha.git

cd NEBULA-Alpha

Step 2๏ธโƒฃ: Install Dependencies

conda create -n nebula python=3.10

conda activate nebula

pip install -e .

pip install -r requirements.txt

# We recommend using a PyTorch version compatible with CUDA 12.4.
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

๐Ÿ—‚๏ธ Using the Unified Data Platform

NEBULAโ€™s data is organized in a structured and scalable format, with each episode stored as a combination of .json (metadata and annotations), .h5 (sensor and trajectory signals), and .mp4 (visual observations). For detailed schema and access examples, please refer to the Unified Data Platform README.

Importantly, NEBULA does not require users to collect data strictly in its native format. A key design goal of the platform is generalizability. We wish to support seamless integration of diverse datasets. To this end, we provide a suite of conversion adapters for many widely-used datasets, allowing users to reformat external data into NEBULA-compatible structure with minimal effort.

โœ… Currently supported sources include:

LeRobot, ManiSkill, RobotNet, Mobile-ALOHA, DROID, Dobbe, FMB, MimicPlay, RoboSet, and Plex RoboSuite.

If you wish to collect your own data in the NEBULA format, we provide two built-in data generation modules:

This modular and extensible design makes NEBULA not only a unified platform, but also an interoperability hub for the broader embodied AI research community.

๐Ÿ“ Evaluation Usage Guide

NEBULA provides a dual-axis evaluation framework designed to assess both what agents can do (capability tests) and how robustly they perform (stress tests). The entire evaluation pipeline is fully automated, supporting one-line execution, JSON-based logging, and visual result generation.

To understand the structure, metrics, and definitions behind each evaluation track, please refer to the Dual-Axis Evaluation Framework.

๐Ÿงญ Baseline Agents

We currently provide six baseline agents along with precomputed results and scripts for evaluation and visualization. More will be added in future releases.

๐Ÿ“ Baseline directory:

Each folder contains:

  • Training / fine-tuning scripts
  • Evaluation configs (config.yaml)

๐Ÿ“‹ Progress:

Baseline Fine-tuning Inference & Visualization Checkpoint
GR00T-1.5 โœ… โœ… ๐Ÿšง
SpatialVLA โœ… โœ… ๐Ÿšง
RDT-1B โœ… โœ… ๐Ÿšง
Diffusion Policy โœ… โœ… ๐Ÿšง
MT-ACT ๐Ÿšง ๐Ÿšง ๐Ÿšง
ACT ๐Ÿšง ๐Ÿšง ๐Ÿšง

TODO: Fine-tuned checkpoints and more model adapters are going to be released soon.

๐Ÿ“Š Comparing Multiple Models

To generate comparison plots across different models using evaluation results:

# Auto-discover and compare all models
python nebula/visualization/generate_graph_multiple.py --all

# Compare selected models explicitly
python nebula/visualization/generate_graph_multiple.py \
    --capability-jsons baselines/gr00t/results_capability_*.json baselines/RDT/results_capability_*.json \
    --stress-jsons baselines/gr00t/results_stress_*.json baselines/RDT/results_stress_*.json

# Capability-only radar chart
python nebula/visualization/generate_graph_multiple.py --all --type capability

# Stress-only bar plot
python nebula/visualization/generate_graph_multiple.py --all --type stress

These plots include:

  • Radar charts (capability skill distribution)
  • Bar plots (stress breakdown)
  • Composite metrics (e.g., stability, latency, Hz)

๐ŸŽฅ Simulation Video Examples

Visual examples of NEBULA tasks across various skills and difficulty levels:

Whether youโ€™re benchmarking a novel model or integrating a real-world agent, NEBULA provides a standardized, transparent, and extensible evaluation pipeline.

๐Ÿ›ฃ๏ธ Future Work

We are actively developing & maintaining the NEBULA ecosystem, with the following improvements planned:

  • Multi-Robot Support: Extend beyond Franka Emika Panda to include other platforms such as UR5, xArm, Fetch, and real-world robot APIs.

  • Expanded Scenes and Task Diversity: Introduce more environments, object sets, and task variations across perception, control, language, and spatial reasoning.

  • More Baseline Agents: Benchmark additional vision-language-action models across both academic and industrial frameworks.

  • Richer Multi-Modal Data: Integrate 3D sensor streams (e.g., point clouds, depth) and dynamic environmental factors (e.g., lighting, shadows, camera motion).

  • Multi-Simulator Compatibility: Add support for additional simulators such as MuJoCo, Isaac Gym, RoboSuite, and Webots, enabling broader research integration.

  • Human-in-the-Loop Interaction Module: Enable real-time interaction with simulated robots and support user-defined task creation, dynamic goal injection, and adaptive feedback during execution.

  • World Model-Augmented Data Generation: Provide a fully automated pipeline for generating high-quality training data using world models. This includes environment simulation, agent rollout, annotation, and format conversionโ€”all without manual intervention.

๐Ÿค Contributing

We welcome contributions to NEBULA and are excited to collaborate with the broader research community!

There are many ways to contribute:

  • ๐Ÿง  Submit bug reports or feature requests via GitHub Issues

  • ๐Ÿ”ง Open pull requests for code improvements, adapters, or new baseline integrations

  • ๐Ÿ“ฆ Add support for new datasets or robots via modular adapters

  • ๐Ÿ“š Help improve documentation and examples

  • ๐Ÿงญ Integrate your own Vision-Language-Action (VLA) models using our model adapter template

If you are interested in collaborating, conducting joint research, or integrating NEBULA into your own project, please feel free to reach out:

  • Dr. Yu Yin (Assistant Professor, Case Western Reserve University): yxy1421 at case.edu

  • Jierui Peng (Project Lead, Case Western Reserve University): jxp1146 at case.edu

We are actively building partnerships across academia and industry to expand the NEBULA ecosystem. Join us!

๐Ÿ™ Acknowledgements

NEBULA is built on top of the excellent work from Sapien and ManiSkill3. We have leveraged their simulation infrastructure, logic, and assets throughout the development of our platform. We express our deep gratitude and sincere respect to their development teams for making such powerful tools openly available to the community.

Salute! ๐Ÿซก

๐Ÿ‘ค Authors

NEBULA is developed and maintained by:

  • Jierui Peng

  • Yanyan Zhang

  • Yicheng Duan

  • Tuo Liang

With guidance from:

  • Prof. Yu Yin (Assistant Professor @ Case Western Reserve University) [Corresponding Author]

  • Prof. Vipin Chaudhary (Department Chair @ Case Western Reserve University)

๐Ÿ“š Citation

@misc{peng2025nebulaevaluatevisionlanguageactionagents,
      title={NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?}, 
      author={Jierui Peng, Yanyan Zhang, Yicheng Duan, Tuo Liang, Vipin Chaudhary, Yu Yin},
      year={2025},
      eprint={2510.16263},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2510.16263}, 
}

About

This is the official repository for NEBULA-Alpha

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages