Task 6: DreamerV3 CPG on Acrobot (second published-model adapter) by Denis-hamon · Pull Request #49 · Denis-hamon/world-model-eval-lab

Denis-hamon · 2026-06-09T19:54:37Z

Task 6 — DreamerV3 CPG on Acrobot (second published-model adapter)

Initiates the GPU_ROADMAP DreamerV3 line. Trains DreamerV3 (NM512/dreamerv3-torch, upstream subprocess) on DMC Acrobot-swingup at --action_repeat 1, ports the world-model weights into wmel.adapters.dreamerv3_adapter, and runs the CPG protocol (oracle arm vs DreamerV3 arm) under the random-shooting TabularWorldModelPlanner — the same harness as the TD-MPC2 Acrobot experiment. This is the first recurrent world model evaluated through the Markovian planner contract.

Everything but the training run was already on main (PR #48): adapter, weight port, experiment harness, multi-model table generator, and the upstream-equivalence test.

Pre-flight (this box: 2x L40S)

./scripts/setup_dreamerv3.sh — upstream cloned to third_party/
dedicated py3.11 training venv (.venv-dreamer): upstream pins (gym 0.22.0, mujoco 2.3.5, dm_control 1.0.9, numpy 1.23.5, torch 2.4.1) + wmel editable; OSMesa present
Gate 1 pytest tests/test_dreamerv3_upstream_equivalence.py — PASS (port matches upstream to float32)
Gate 2 --smoke end-to-end (wiring + port + both CPG arms)
Full run --varied-init --seed 0 (500k steps, GPU 0)
Regenerate results/MODEL_TABLE.md (first DreamerV3 row next to TD-MPC2)

Expected outputs

results/dmc_acrobot/dreamerv3_cpg.json
refreshed results/MODEL_TABLE.md
checkpoint NOT committed (release asset)

Draft until the full run lands and the table is refreshed.

The experiment defaults MUJOCO_GL=egl for the main process (CPG arms' dm_control import), but upstream dreamer.py hard-sets MUJOCO_GL=osmesa. The egl value leaked into the inherited child env and dm_control aborted with 'PYOPENGL_PLATFORM is set to egl, should be unset or osmesa', so the documented `--smoke` command failed out of the box. Pass an explicit osmesa env to the training subprocess only; the CPG arms keep egl.

500k-step DreamerV3 (dmc_proprio) on dmc_acrobot_swingup, random-shooting planner, varied init, n=10. Result: oracle 0.100 / DreamerV3 0.100, CPG +0.000, 95% AC CI [-0.298, +0.298], INCONCLUSIVE. This is identical to the TD-MPC2 random-shooting Acrobot cell (same n, same success rates, same verdict): the CPG metric reproduces the same reading across two published model families. Both learned models match the oracle dynamics, so the random-shooting planner is the bottleneck on Acrobot regardless of model family; the Markovian-projection truncation of DreamerV3's recurrent state costs nothing measurable here. n=10 matches the TD-MPC2 random-shooting cell exactly, so no seed pooling (the n=150 cells are the CEM/Task 8 column). Checkpoint is a release asset, not committed.

Denis-hamon added 3 commits June 9, 2026 19:50

chore(roadmap): mark Task 6 (DreamerV3 Acrobot CPG) in_flight

e5ae8ed

Denis-hamon marked this pull request as ready for review June 10, 2026 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task 6: DreamerV3 CPG on Acrobot (second published-model adapter)#49

Task 6: DreamerV3 CPG on Acrobot (second published-model adapter)#49
Denis-hamon wants to merge 3 commits into
mainfrom
phase-5y-dreamerv3-acrobot

Denis-hamon commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Denis-hamon commented Jun 9, 2026