Denis-hamon · Denis-hamon · Jun 9, 2026 · Jun 9, 2026 · Jun 10, 2026
@@ -145,7 +145,7 @@ pooled-30 cells are MODEL BOTTLENECK; see PR.
 **Branch**: `phase-5y-dreamerv3-acrobot`
 **Effort**: ~10-20 h GPU (500k env steps, dmc_proprio config, single env)
 **Priority**: HIGH (multi-model generality claim; everything but GPU time is already shipped)
-**Status**: queued
+**Status**: in_flight
 
 **Steps**:
 

@@ -212,7 +212,13 @@ def _train_dreamerv3(
         cmd += ["--prefill", "500", "--eval_every", "1000", "--log_every", "500"]
     workdir.mkdir(parents=True, exist_ok=True)
     print(f"[train] {' '.join(cmd)}")
-    subprocess.run(cmd, cwd=str(_DREAMER_PKG), check=True)
+    # Upstream's dreamer.py hard-sets MUJOCO_GL=osmesa, but this process
+    # defaults MUJOCO_GL=egl (line ~76) for the CPG arms' dm_control import.
+    # That egl leaks into the child env and dm_control then refuses to start
+    # ("PYOPENGL_PLATFORM is set to 'egl', should be unset or 'osmesa'").
+    # Force the osmesa render path the subprocess actually uses.
+    train_env = {**os.environ, "MUJOCO_GL": "osmesa", "PYOPENGL_PLATFORM": "osmesa"}
+    subprocess.run(cmd, cwd=str(_DREAMER_PKG), check=True, env=train_env)
 
 
 def _port_agent_checkpoint(

@@ -12,6 +12,7 @@ planner outperforms the same planner on the learned dynamics.
 
 | Environment | Model | Planner | Init | n/arm | Oracle | Learned | CPG | 95% AC CI | Verdict |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| dmc_acrobot_swingup | dreamerv3 | random-shooting | varied | 10 | 0.100 | 0.100 | +0.000 | [-0.298, +0.298] | INCONCLUSIVE |
 | dmc_acrobot_swingup | learned MLP | random-shooting | varied | 10 | 0.100 | 0.100 | +0.000 | [-0.298, +0.298] | INCONCLUSIVE |
 | dmc_acrobot_swingup | mlp_on_tdmpc2_data | cem | varied | 150 | 0.033 | 0.020 | +0.013 | [-0.027, +0.053] | PLANNER BOTTLENECK |
 | dmc_acrobot_swingup | mlp_on_tdmpc2_data | random-shooting | varied | 10 | 0.100 | 0.100 | +0.000 | [-0.298, +0.298] | INCONCLUSIVE |
@@ -32,6 +33,7 @@ planner outperforms the same planner on the learned dynamics.
 
 ## Row sources
 
+- `dmc_acrobot_swingup` / `dreamerv3` / `random-shooting` / varied: `results/dmc_acrobot/dreamerv3_cpg.json`
 - `dmc_acrobot_swingup` / `learned MLP` / `random-shooting` / varied: `results/dmc_acrobot/cpg.json`
 - `dmc_acrobot_swingup` / `mlp_on_tdmpc2_data` / `cem` / varied: `results/dmc_acrobot/cem_cpg_sweep.json`
 - `dmc_acrobot_swingup` / `mlp_on_tdmpc2_data` / `random-shooting` / varied: `results/dmc_acrobot/coverage_mlp_on_tdmpc2_cpg.json`