Question about RoboTwin state/action ordering in lingbot_robotwin_policy_rep.py

Hi LingBot-VLA team,

Thank you for releasing the RoboTwin SFT checkpoint and the corresponding `lingbot_robotwin_policy_rep.py` inference code. I am integrating the March RoboTwin SFT inference path into RLinf and would like to confirm my understanding of the RoboTwin state/action ordering conventions, to avoid introducing a silent mismatch.

This is not a bug report. It is mainly a clarification question about the representation used by the released RoboTwin SFT model.

In `deploy/lingbot_robotwin_policy_rep.py` at commit `fbf10a91df806cb8a5807f8fe1052dc7c6d1d3ef`, I noticed the following reorder logic:

```python
# raw RoboTwin env state -> robotwin_rep state
indices = list(range(6)) + list(range(7, 13)) + [6] + [13]
observation["observation.state"] = observation["observation.state"][indices]

# padded state before model inference
state_indices = list(range(12)) + list(range(73, 75)) + list(range(12, 14)) + list(range(14, 73))
observation["state"] = observation["state"][state_indices]

# model action -> RoboTwin env action
action_indices = list(range(6)) + [14] + list(range(6, 12)) + [15]
actions = actions[:, :, action_indices]
observation["action"] = actions.squeeze(0)[:, :14]
```

My current interpretation is:

1. The raw RoboTwin env state is ordered as:

```text
left_arm_6, left_gripper, right_arm_6, right_gripper
```

2. The `robotwin_rep` normalizer expects the 14 valid state dimensions as:

```text
left_arm_6, right_arm_6, left_gripper, right_gripper
```

3. `prepare_state` pads this 14-d state to `max_state_dim=75` by appending zeros. Therefore, after padding, the two gripper dimensions are still at indices `12` and `13`, while indices `73` and `74` are padded zero slots.

4. The later reorder

```python
state_indices = list(range(12)) + list(range(73, 75)) + list(range(12, 14)) + list(range(14, 73))
```

appears to insert two zero padding slots into model-state positions `12` and `13`, and shift the two gripper dimensions to model-state positions `14` and `15`.

5. The model action output appears to follow a similar model-side convention where the two gripper action dimensions are at indices `14` and `15`. Therefore, the final RoboTwin env action is selected as:

```text
0..5, 14, 6..11, 15
```

producing the env action order:

```text
left_arm_6, left_gripper, right_arm_6, right_gripper
```

Could you please confirm whether this interpretation is correct?

In particular, I would like to clarify:

- Why are model-state positions `12` and `13` filled with padded zero slots, while the two gripper state dimensions are shifted to positions `14` and `15`?
- Is this state/action ordering specific to the released `robbyant/lingbot-vla-4b-posttrain-robotwin` checkpoint, or should it be considered the general RoboTwin convention for LingBot-VLA going forward?
- Is `robotwin_all_new.json` expected to be used together with `data_type="robotwin_rep"` for this SFT checkpoint?

Thanks again for the great work and for making the RoboTwin SFT model available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about RoboTwin state/action ordering in lingbot_robotwin_policy_rep.py #49

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question about RoboTwin state/action ordering in lingbot_robotwin_policy_rep.py #49

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions