Skip to content

Question about RoboTwin state/action ordering in lingbot_robotwin_policy_rep.py #49

Description

@lwbscu

Hi LingBot-VLA team,

Thank you for releasing the RoboTwin SFT checkpoint and the corresponding lingbot_robotwin_policy_rep.py inference code. I am integrating the March RoboTwin SFT inference path into RLinf and would like to confirm my understanding of the RoboTwin state/action ordering conventions, to avoid introducing a silent mismatch.

This is not a bug report. It is mainly a clarification question about the representation used by the released RoboTwin SFT model.

In deploy/lingbot_robotwin_policy_rep.py at commit fbf10a91df806cb8a5807f8fe1052dc7c6d1d3ef, I noticed the following reorder logic:

# raw RoboTwin env state -> robotwin_rep state
indices = list(range(6)) + list(range(7, 13)) + [6] + [13]
observation["observation.state"] = observation["observation.state"][indices]

# padded state before model inference
state_indices = list(range(12)) + list(range(73, 75)) + list(range(12, 14)) + list(range(14, 73))
observation["state"] = observation["state"][state_indices]

# model action -> RoboTwin env action
action_indices = list(range(6)) + [14] + list(range(6, 12)) + [15]
actions = actions[:, :, action_indices]
observation["action"] = actions.squeeze(0)[:, :14]

My current interpretation is:

  1. The raw RoboTwin env state is ordered as:
left_arm_6, left_gripper, right_arm_6, right_gripper
  1. The robotwin_rep normalizer expects the 14 valid state dimensions as:
left_arm_6, right_arm_6, left_gripper, right_gripper
  1. prepare_state pads this 14-d state to max_state_dim=75 by appending zeros. Therefore, after padding, the two gripper dimensions are still at indices 12 and 13, while indices 73 and 74 are padded zero slots.

  2. The later reorder

state_indices = list(range(12)) + list(range(73, 75)) + list(range(12, 14)) + list(range(14, 73))

appears to insert two zero padding slots into model-state positions 12 and 13, and shift the two gripper dimensions to model-state positions 14 and 15.

  1. The model action output appears to follow a similar model-side convention where the two gripper action dimensions are at indices 14 and 15. Therefore, the final RoboTwin env action is selected as:
0..5, 14, 6..11, 15

producing the env action order:

left_arm_6, left_gripper, right_arm_6, right_gripper

Could you please confirm whether this interpretation is correct?

In particular, I would like to clarify:

  • Why are model-state positions 12 and 13 filled with padded zero slots, while the two gripper state dimensions are shifted to positions 14 and 15?
  • Is this state/action ordering specific to the released robbyant/lingbot-vla-4b-posttrain-robotwin checkpoint, or should it be considered the general RoboTwin convention for LingBot-VLA going forward?
  • Is robotwin_all_new.json expected to be used together with data_type="robotwin_rep" for this SFT checkpoint?

Thanks again for the great work and for making the RoboTwin SFT model available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions