mujoco-ur-arm-rl

Reinforcement learning training environment for the Universal Robots UR5e arm using MuJoCo, with a Robotiq 2F-85 gripper and graspable objects.

Overview

Robot: UR5e + Robotiq 2F-85 gripper
Task: Reach / pick-and-place / cooperative handover
Algorithm: SAC (Stable-Baselines3)
Simulator: MuJoCo 3.x

Environments

Env	Task	Arms
`URReachEnv`	Move end-effector to target	1
`URPickPlaceEnv`	Pick box and place at target	1
`URDualArmEnv`	Symmetric multi-arm pick / lift / place	4+ even
`SharedArmPickPlaceEnv`	Reusable local policy extracted from symmetric arms	1 (local view)

Multi-Arm Training

Four or more UR5e arms with Robotiq 2F-85 grippers arranged symmetrically. Each arm gets its own table, object, and drop zone, and the layout can scale to 8 arms in the same pattern.

The current multi-arm reward is contact-gated: the policy is rewarded for approaching the object, but grasp/lift progress only counts once the object actually contacts both gripper sides and starts moving upward.

python3 scripts/train/train_dual_arm_live.py --arms 4 --n-envs 4

python3 scripts/train/train_dual_arm_live.py --arms 8 --n-envs 2

Setup

pip install mujoco gymnasium stable-baselines3

Clone MuJoCo Menagerie to /home/asimov/mujoco_menagerie.

Project Layout

envs/: MuJoCo Gymnasium environments
scripts/train/: training entrypoints
mujoco_ur_rl_ros2/: ROS2 policy node package
launch/: ROS2 launch files
assets/: README images and visual references

Train

# Single arm reach
python3 scripts/train/train.py

# Single arm pick-and-place
python3 scripts/train/train_pick_place.py

# Multi-arm training, headless by default
python3 scripts/train/train_dual_arm_live.py --arms 4 --n-envs 4

# Larger symmetric layout
python3 scripts/train/train_dual_arm_live.py --arms 8 --n-envs 2

# Same trainer, but with the MuJoCo viewer
python3 scripts/train/train_dual_arm_live.py --arms 4 --n-envs 2 --viewer

# Use GPU when PyTorch CUDA is available
python3 scripts/train/train_dual_arm_live.py --arms 4 --n-envs 4 --device cuda

# Train one reusable 7-action arm policy in an 8-arm scene
python3 scripts/train/train_shared_arm.py --arms 8 --n-envs 4 --device cuda

# Faster shared-policy mode: every arm in each scene contributes a sample each step
python3 scripts/train/train_shared_arm.py --arms 8 --n-envs 2 --all-arms-samples --device cuda

# Resume the latest shared-arm run from its newest checkpoint
python3 scripts/train/train_shared_arm.py \
  --arms 8 --n-envs 2 --all-arms-samples --device cuda \
  --resume-model models/shared_arm/shared_arm_8arm_all_samples_resume_20260410_1501/checkpoints/shared_arm_79952_steps.zip \
  --run-name shared_arm_resume_$(date +%Y%m%d_%H%M%S)

# Apply that one shared arm policy to every arm in a 4-arm or 8-arm scene
python3 scripts/train/play_shared_arm.py --model models/shared_arm/<run_name>/best_model.zip --arms 8 --viewer --deterministic

Best multi-arm models and checkpoints are saved under models/multi_arm/, run logs under logs/multi_arm/. Shared one-arm policy runs use models/shared_arm/ and logs/shared_arm/. The best trained .zip RL models are committed to the repository. The default ROS/Gazebo nodes automatically read models/shared_arm/shared_arm_8arm_all_samples_resume_20260410_1501/best_model.zip and models/best_model.zip, so they work out-of-the-box after cloning.

Monitor Progress

# Find the newest run directory
cat logs/multi_arm/latest_run.txt

# Watch the live console log
tail -f logs/multi_arm/<run_name>.out

# Check the latest heartbeat
cat logs/multi_arm/<run_name>/latest_status.json

Useful signals:

timesteps: training is advancing
ep_rew_mean: average training reward per finished episode
eval_mean_reward: evaluation reward from the latest eval pass
fps: simulation throughput

ROS2 Package

This repo can also act as a ROS2 Python package named mujoco_ur_rl_ros2. The packaged node loads a trained SAC policy and publishes UR5e joint trajectories.

Build it from a ROS2 workspace:

mkdir -p ~/ros2_ws/src
cd ~/ros2_ws/src
git clone https://github.com/darshmenon/mujoco-ur-arm-rl.git
cd ~/ros2_ws
colcon build --packages-select mujoco_ur_rl_ros2
source install/setup.bash

The ROS2 node still expects the Python environment to have numpy and stable-baselines3 available.

Run the node directly:

ros2 run mujoco_ur_rl_ros2 ur_policy_node --ros-args \
  -p model_path:=/home/asimov/mujoco-ur-arm-rl/models/best_model.zip \
  -p target_x:=0.4 -p target_y:=0.0 -p target_z:=0.4

Or launch it:

ros2 launch mujoco_ur_rl_ros2 ur_policy.launch.py \
  model_path:=/home/asimov/mujoco-ur-arm-rl/models/best_model.zip

Run the shared-arm policy node directly:

ros2 run mujoco_ur_rl_ros2 shared_arm_policy_node --ros-args \
  -p model_path:=/path/to/mujoco-ur-arm-rl/models/shared_arm/<run_name>/best_model.zip \
  -p arm_trajectory_topic:=/scaled_joint_trajectory_controller/joint_trajectory \
  -p gripper_trajectory_topic:=/gripper_controller/joint_trajectory \
  -p ee_x:=0.0 -p ee_y:=0.0 -p ee_z:=0.0 \
  -p object_x:=-1.18 -p object_y:=0.0 -p object_z:=0.045 \
  -p drop_x:=-1.25 -p drop_y:=0.0 -p drop_z:=0.02

Launch Gazebo plus the shared policy (uses the bundled ur_gazebo package by default):

ros2 launch mujoco_ur_rl_ros2 gazebo_shared_arm_policy.launch.py \
  model_path:=/path/to/mujoco-ur-arm-rl/models/shared_arm/<run_name>/best_model.zip \
  ee_x:=0.0 ee_y:=0.0 ee_z:=0.0 \
  object_x:=-1.18 object_y:=0.0 object_z:=0.045 \
  drop_x:=-1.25 drop_y:=0.0 drop_z:=0.02

Override gazebo_launch_path if you want to use a different Gazebo stack:

ros2 launch mujoco_ur_rl_ros2 gazebo_shared_arm_policy.launch.py \
  gazebo_launch_path:=/path/to/ur_gazebo/launch/ur.gazebo.launch.py \
  model_path:=/path/to/mujoco-ur-arm-rl/models/shared_arm/<run_name>/best_model.zip

Gazebo notes:

shared_arm_policy_node is now exported as a ROS2 executable, so ros2 run and ros2 launch can both find it after rebuilding the package.
The shared-arm node defaults to /scaled_joint_trajectory_controller/joint_trajectory, which is the common UR controller topic in Gazebo setups.
Override arm_joint_names or gripper_joint_names if your Gazebo stack uses different joint names or a multi-joint gripper controller.
The shared policy does not infer object or drop poses from Gazebo yet; set ee_*, object_*, drop_*, and phase to match the scene you launch.

ROS2 files added in this repo:

package.xml, setup.py, setup.cfg
mujoco_ur_rl_ros2/ur_policy_node.py
mujoco_ur_rl_ros2/shared_arm_policy_node.py
launch/ur_policy.launch.py
launch/gazebo_shared_arm_policy.launch.py
ros2/ur_policy_node.py remains as a compatibility wrapper

Visualize

python3 scripts/train/train_dual_arm_live.py --arms 4 --n-envs 1 --viewer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mujoco-ur-arm-rl

Overview

Environments

Multi-Arm Training

Setup

Project Layout

Train

Monitor Progress

ROS2 Package

Visualize

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
envs		envs
launch		launch
models		models
mujoco_ur_rl_ros2		mujoco_ur_rl_ros2
resource		resource
ros2		ros2
scripts/train		scripts/train
ur_description		ur_description
ur_gazebo		ur_gazebo
ur_rl_moveit_config		ur_rl_moveit_config
.gitignore		.gitignore
README.md		README.md
concepts.md		concepts.md
package.xml		package.xml
setup.cfg		setup.cfg
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

mujoco-ur-arm-rl

Overview

Environments

Multi-Arm Training

Setup

Project Layout

Train

Monitor Progress

ROS2 Package

Visualize

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages