Task 6: DreamerV3 CPG on Acrobot (second published-model adapter)#49
Open
Denis-hamon wants to merge 3 commits into
Open
Task 6: DreamerV3 CPG on Acrobot (second published-model adapter)#49Denis-hamon wants to merge 3 commits into
Denis-hamon wants to merge 3 commits into
Conversation
The experiment defaults MUJOCO_GL=egl for the main process (CPG arms' dm_control import), but upstream dreamer.py hard-sets MUJOCO_GL=osmesa. The egl value leaked into the inherited child env and dm_control aborted with 'PYOPENGL_PLATFORM is set to egl, should be unset or osmesa', so the documented `--smoke` command failed out of the box. Pass an explicit osmesa env to the training subprocess only; the CPG arms keep egl.
500k-step DreamerV3 (dmc_proprio) on dmc_acrobot_swingup, random-shooting planner, varied init, n=10. Result: oracle 0.100 / DreamerV3 0.100, CPG +0.000, 95% AC CI [-0.298, +0.298], INCONCLUSIVE. This is identical to the TD-MPC2 random-shooting Acrobot cell (same n, same success rates, same verdict): the CPG metric reproduces the same reading across two published model families. Both learned models match the oracle dynamics, so the random-shooting planner is the bottleneck on Acrobot regardless of model family; the Markovian-projection truncation of DreamerV3's recurrent state costs nothing measurable here. n=10 matches the TD-MPC2 random-shooting cell exactly, so no seed pooling (the n=150 cells are the CEM/Task 8 column). Checkpoint is a release asset, not committed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Task 6 — DreamerV3 CPG on Acrobot (second published-model adapter)
Initiates the GPU_ROADMAP DreamerV3 line. Trains DreamerV3 (NM512/dreamerv3-torch, upstream subprocess) on DMC Acrobot-swingup at
--action_repeat 1, ports the world-model weights intowmel.adapters.dreamerv3_adapter, and runs the CPG protocol (oracle arm vs DreamerV3 arm) under the random-shootingTabularWorldModelPlanner— the same harness as the TD-MPC2 Acrobot experiment. This is the first recurrent world model evaluated through the Markovian planner contract.Everything but the training run was already on main (PR #48): adapter, weight port, experiment harness, multi-model table generator, and the upstream-equivalence test.
Pre-flight (this box: 2x L40S)
./scripts/setup_dreamerv3.sh— upstream cloned tothird_party/.venv-dreamer): upstream pins (gym 0.22.0, mujoco 2.3.5, dm_control 1.0.9, numpy 1.23.5, torch 2.4.1) + wmel editable; OSMesa presentpytest tests/test_dreamerv3_upstream_equivalence.py— PASS (port matches upstream to float32)--smokeend-to-end (wiring + port + both CPG arms)--varied-init --seed 0(500k steps, GPU 0)results/MODEL_TABLE.md(first DreamerV3 row next to TD-MPC2)Expected outputs
results/dmc_acrobot/dreamerv3_cpg.jsonresults/MODEL_TABLE.mdDraft until the full run lands and the table is refreshed.