Hi LingBot-VLA team,
Thank you for releasing the RoboTwin SFT checkpoint and the corresponding lingbot_robotwin_policy_rep.py inference code. I am integrating the March RoboTwin SFT inference path into RLinf and would like to confirm my understanding of the RoboTwin state/action ordering conventions, to avoid introducing a silent mismatch.
This is not a bug report. It is mainly a clarification question about the representation used by the released RoboTwin SFT model.
In deploy/lingbot_robotwin_policy_rep.py at commit fbf10a91df806cb8a5807f8fe1052dc7c6d1d3ef, I noticed the following reorder logic:
# raw RoboTwin env state -> robotwin_rep state
indices = list(range(6)) + list(range(7, 13)) + [6] + [13]
observation["observation.state"] = observation["observation.state"][indices]
# padded state before model inference
state_indices = list(range(12)) + list(range(73, 75)) + list(range(12, 14)) + list(range(14, 73))
observation["state"] = observation["state"][state_indices]
# model action -> RoboTwin env action
action_indices = list(range(6)) + [14] + list(range(6, 12)) + [15]
actions = actions[:, :, action_indices]
observation["action"] = actions.squeeze(0)[:, :14]
My current interpretation is:
- The raw RoboTwin env state is ordered as:
left_arm_6, left_gripper, right_arm_6, right_gripper
- The
robotwin_rep normalizer expects the 14 valid state dimensions as:
left_arm_6, right_arm_6, left_gripper, right_gripper
-
prepare_state pads this 14-d state to max_state_dim=75 by appending zeros. Therefore, after padding, the two gripper dimensions are still at indices 12 and 13, while indices 73 and 74 are padded zero slots.
-
The later reorder
state_indices = list(range(12)) + list(range(73, 75)) + list(range(12, 14)) + list(range(14, 73))
appears to insert two zero padding slots into model-state positions 12 and 13, and shift the two gripper dimensions to model-state positions 14 and 15.
- The model action output appears to follow a similar model-side convention where the two gripper action dimensions are at indices
14 and 15. Therefore, the final RoboTwin env action is selected as:
producing the env action order:
left_arm_6, left_gripper, right_arm_6, right_gripper
Could you please confirm whether this interpretation is correct?
In particular, I would like to clarify:
- Why are model-state positions
12 and 13 filled with padded zero slots, while the two gripper state dimensions are shifted to positions 14 and 15?
- Is this state/action ordering specific to the released
robbyant/lingbot-vla-4b-posttrain-robotwin checkpoint, or should it be considered the general RoboTwin convention for LingBot-VLA going forward?
- Is
robotwin_all_new.json expected to be used together with data_type="robotwin_rep" for this SFT checkpoint?
Thanks again for the great work and for making the RoboTwin SFT model available.
Hi LingBot-VLA team,
Thank you for releasing the RoboTwin SFT checkpoint and the corresponding
lingbot_robotwin_policy_rep.pyinference code. I am integrating the March RoboTwin SFT inference path into RLinf and would like to confirm my understanding of the RoboTwin state/action ordering conventions, to avoid introducing a silent mismatch.This is not a bug report. It is mainly a clarification question about the representation used by the released RoboTwin SFT model.
In
deploy/lingbot_robotwin_policy_rep.pyat commitfbf10a91df806cb8a5807f8fe1052dc7c6d1d3ef, I noticed the following reorder logic:My current interpretation is:
robotwin_repnormalizer expects the 14 valid state dimensions as:prepare_statepads this 14-d state tomax_state_dim=75by appending zeros. Therefore, after padding, the two gripper dimensions are still at indices12and13, while indices73and74are padded zero slots.The later reorder
appears to insert two zero padding slots into model-state positions
12and13, and shift the two gripper dimensions to model-state positions14and15.14and15. Therefore, the final RoboTwin env action is selected as:producing the env action order:
Could you please confirm whether this interpretation is correct?
In particular, I would like to clarify:
12and13filled with padded zero slots, while the two gripper state dimensions are shifted to positions14and15?robbyant/lingbot-vla-4b-posttrain-robotwincheckpoint, or should it be considered the general RoboTwin convention for LingBot-VLA going forward?robotwin_all_new.jsonexpected to be used together withdata_type="robotwin_rep"for this SFT checkpoint?Thanks again for the great work and for making the RoboTwin SFT model available.