Skip to content

💡 FAQ #42

Description

@jiangbonadia

To help more developers effectively put models into practical use, we have recently open-sourced the LingBot-VLA real-world post-training code. It covers core modules including training configuration, open-loop evaluation, and deployment & inference. The release aims to lower the industry entry barrier and drive LingBot-VLA from reproducible research toward practical industrial adoption.

We aim to substantially lower the technical barrier via the open-source release of LingBot-VLA, enabling more developers to understand, deploy and run the model smoothly. We jointly advance the large-scale practical deployment of embodied intelligence technologies.All source code, model weights and technical documentation are now fully open-sourced for developer access:

Project Website:https://technology.robbyant.com/lingbot-vla

Technical Report:https://arxiv.org/pdf/2601.18692

Code:https://github.com/Robbyant/lingbot-vla

Model:https://huggingface.co/collections/robbyant/lingbot-vla

The FAQ will be continuously updated. Discussions from the community are highly welcome!

FAQ

Q1: Does the prompt "Load MoGe Module Failed!!" affect training after launch?

A1: This prompt indicates a failure to load the depth module. It has no impact if you only train the standard LingBot-VLA. Normal module loading is required for training LingBot-VLA-Depth.

Q2: Will LingBot-VLA pre-training data be open-sourced?

A2: There is currently no plan to release the pre-training dataset.

Q3: How many data samples are required for custom task post-training?

A3: The volume depends on task complexity. Around 100 samples suffice for simple tasks, while complex tasks require several hundred to nearly 1,000 samples.

Q4: Is it normal to receive numerous warnings and error logs as shown in the screenshot when starting training?

A4: This is a normal occurrence and generally does not interfere with training.
These logs are generated by torch._dynamo (a core component of TorchDynamo/torch.compile). Runtime assertion and debug messages are printed only during initial model graph compilation and symbolic shape validation.

When Dynamo encounters statically undeducible shapes or conditional branches during tracing and graph capture, it will output prompts such as:
Runtime assertion failed for expression uX >= 0
No stacktrace found for following nodes

These are merely hints of graph break or fallback to eager execution, rather than actual runtime errors.

Q5: How much VRAM is required for model inference?

A5: Approximately 8775 MiB.

Q6: The file lingbotvla_cli.yaml cannot be found.

A6: The file is stored under the post-training output path:
train.output_dir/checkpoints/*/hf_ckpt/lingbotvla_cli.yaml

Q7: The L1 loss converges to a relatively high value at around 0.0x during post-training.

A7: L1 loss naturally converges to a larger value than MSE loss. Even convergence at 0.0x can effectively fit the action trajectory of the dataset; it does not require an order-of-magnitude smaller loss for proper fitting, which can be validated via open-loop testing.

For real-world robot experiments, we recommend training with MSE loss. Refer to the configuration file below:
https://github.com/Robbyant/lingbot-vla/blob/main/configs/vla/real_load20000h.yaml

Q8: What action space is used for model training?

A8: We recommend relative qpose for post-training.

Q9: Insufficient VRAM to support batch size = 32 per GPU (micro_batch_size=32).

A9: You can adopt gradient accumulation. We recommend a global batch size of 256, and configure gradient_accumulation_steps with the formula:
gradient_accumulation_steps = 256 / Number of GPUs / Maximum feasible per-GPU batch size

Q10: Can the model be deployed on robot hardware configurations beyond the 9 original setups used for pre-training?

A10: Yes. You only need to adapt the control interface of the target robot hardware.


为了帮助更多开发者真正把模型“用起来”,近日我们开源了 LingBot-VLA 真机后训练代码,涵盖训练设置、开环评测、部署推理等核心模块,旨在降低行业入门门槛,推动 VLA 从“可复现”走向“可实用”。

我们希望通过 LingBot-VLA 的开源实践,真正降低行业门槛,让更多开发者能够“看得懂、用得上、跑得起来”,共同推动具身智能走向规模化实用落地。现所有代码、模型与文档均已开源,欢迎开发者下载使用:

项目网站:https://technology.robbyant.com/lingbot-vla

技术报告:https://arxiv.org/pdf/2601.18692

开源代码:https://github.com/Robbyant/lingbot-vla

模型权重:https://huggingface.co/collections/robbyant/lingbot-vla

我们的 FAQ 将持续更新,欢迎更多讨论交流!

Q1: 启动训练后报了 “Load MoGe Module Failed!!” 会有影响吗?

A1: 这个是加载depth模块,只训练 Lingbot-VLA 的话没有影响,训练 Lingbot-VLA-Depth 的话需要正常加载。

Q2: 请问 LingBot-VLA 预训练数据会开源吗?

A2: 暂时没有开源预训练数据计划。

Q3: 自定义任务进行后训练需要采集多少条数据?

A3: 需要依据任务难以程度而定,简单任务一百条左右可以完成,任务比较困难的话需要几百条到近千条数据。

Q4: 启动训练时print很多图中所示的warning和报错正常吗?

A4: 这是正常现象,一般不影响训练。 这些日志来自 PyTorch 的 torch._dynamo(TorchDynamo / torch.compile 的组件),只有在训练初期的模型图编译和符号形状(symbolic shape)检查时,会打印一些 runtime assertion 和 debug 信息。 当 Dynamo 在 tracing / graph capture 过程中遇到某些无法静态推断的 shape 或条件时,会触发类似:
Runtime assertion failed for expression uX >= 0
No stacktrace found for following nodes

这些信息通常只是 graph break 或 fallback 到 eager execution 的提示,并不代表程序真正报错。

Q5:启动模型推理需要多大的显存?

A5:需要约 8775 MiB

Q6:lingbotvla_cli.yaml找不到

A6: 这个文件会保存在后训练的output路径下:train.output_dir/checkpoints/*/hf_ckpt/lingbotvla_cli.yaml

Q7:后训练的时候使用L1 loss 最后的收敛值很大,只能到0.0x左右

A7: l1 loss的收敛值会比mse loss大,但是L1 loss在0.0x左右也能拟合数据集action曲线,不存在loss需要小一个数量级才能够拟合,可通过开环测试验证。在真机实验中,我们推荐使用mse loss进行训练,请参考https://github.com/Robbyant/lingbot-vla/blob/main/configs/vla/real_load20000h.yaml

Q8:用什么action space来训练模型?

A8: 我们推荐用relative qpose进行后训练。

Q9:显存不足以支撑每gpu上batch size=32训练 (micro_batch_size=32)

A9: 可以使用梯度累积,我们推荐global batch 256, 设置gradient_accumulation_steps=256/卡数/最大可设置单卡batch size。

Q10: 能够在预训练使用的9种构型之外的本体部署吗?

A10:可以的,需要适配一下使用本体的控制接口。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions