Skip to content

Task 6: DreamerV3 CPG on Acrobot (second published-model adapter)#49

Open
Denis-hamon wants to merge 3 commits into
mainfrom
phase-5y-dreamerv3-acrobot
Open

Task 6: DreamerV3 CPG on Acrobot (second published-model adapter)#49
Denis-hamon wants to merge 3 commits into
mainfrom
phase-5y-dreamerv3-acrobot

Conversation

@Denis-hamon

Copy link
Copy Markdown
Owner

Task 6 — DreamerV3 CPG on Acrobot (second published-model adapter)

Initiates the GPU_ROADMAP DreamerV3 line. Trains DreamerV3 (NM512/dreamerv3-torch, upstream subprocess) on DMC Acrobot-swingup at --action_repeat 1, ports the world-model weights into wmel.adapters.dreamerv3_adapter, and runs the CPG protocol (oracle arm vs DreamerV3 arm) under the random-shooting TabularWorldModelPlanner — the same harness as the TD-MPC2 Acrobot experiment. This is the first recurrent world model evaluated through the Markovian planner contract.

Everything but the training run was already on main (PR #48): adapter, weight port, experiment harness, multi-model table generator, and the upstream-equivalence test.

Pre-flight (this box: 2x L40S)

  • ./scripts/setup_dreamerv3.sh — upstream cloned to third_party/
  • dedicated py3.11 training venv (.venv-dreamer): upstream pins (gym 0.22.0, mujoco 2.3.5, dm_control 1.0.9, numpy 1.23.5, torch 2.4.1) + wmel editable; OSMesa present
  • Gate 1 pytest tests/test_dreamerv3_upstream_equivalence.py — PASS (port matches upstream to float32)
  • Gate 2 --smoke end-to-end (wiring + port + both CPG arms)
  • Full run --varied-init --seed 0 (500k steps, GPU 0)
  • Regenerate results/MODEL_TABLE.md (first DreamerV3 row next to TD-MPC2)

Expected outputs

  • results/dmc_acrobot/dreamerv3_cpg.json
  • refreshed results/MODEL_TABLE.md
  • checkpoint NOT committed (release asset)

Draft until the full run lands and the table is refreshed.

The experiment defaults MUJOCO_GL=egl for the main process (CPG arms'
dm_control import), but upstream dreamer.py hard-sets MUJOCO_GL=osmesa.
The egl value leaked into the inherited child env and dm_control aborted
with 'PYOPENGL_PLATFORM is set to egl, should be unset or osmesa', so the
documented `--smoke` command failed out of the box. Pass an explicit
osmesa env to the training subprocess only; the CPG arms keep egl.
500k-step DreamerV3 (dmc_proprio) on dmc_acrobot_swingup, random-shooting
planner, varied init, n=10. Result: oracle 0.100 / DreamerV3 0.100,
CPG +0.000, 95% AC CI [-0.298, +0.298], INCONCLUSIVE.

This is identical to the TD-MPC2 random-shooting Acrobot cell (same n,
same success rates, same verdict): the CPG metric reproduces the same
reading across two published model families. Both learned models match
the oracle dynamics, so the random-shooting planner is the bottleneck on
Acrobot regardless of model family; the Markovian-projection truncation
of DreamerV3's recurrent state costs nothing measurable here.

n=10 matches the TD-MPC2 random-shooting cell exactly, so no seed pooling
(the n=150 cells are the CEM/Task 8 column). Checkpoint is a release
asset, not committed.
@Denis-hamon Denis-hamon marked this pull request as ready for review June 10, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant