Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion experiments/GPU_ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ pooled-30 cells are MODEL BOTTLENECK; see PR.
**Branch**: `phase-5y-dreamerv3-acrobot`
**Effort**: ~10-20 h GPU (500k env steps, dmc_proprio config, single env)
**Priority**: HIGH (multi-model generality claim; everything but GPU time is already shipped)
**Status**: queued
**Status**: in_flight

**Steps**:

Expand Down
8 changes: 7 additions & 1 deletion experiments/dmc_acrobot/dreamerv3_cpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,13 @@ def _train_dreamerv3(
cmd += ["--prefill", "500", "--eval_every", "1000", "--log_every", "500"]
workdir.mkdir(parents=True, exist_ok=True)
print(f"[train] {' '.join(cmd)}")
subprocess.run(cmd, cwd=str(_DREAMER_PKG), check=True)
# Upstream's dreamer.py hard-sets MUJOCO_GL=osmesa, but this process
# defaults MUJOCO_GL=egl (line ~76) for the CPG arms' dm_control import.
# That egl leaks into the child env and dm_control then refuses to start
# ("PYOPENGL_PLATFORM is set to 'egl', should be unset or 'osmesa'").
# Force the osmesa render path the subprocess actually uses.
train_env = {**os.environ, "MUJOCO_GL": "osmesa", "PYOPENGL_PLATFORM": "osmesa"}
subprocess.run(cmd, cwd=str(_DREAMER_PKG), check=True, env=train_env)


def _port_agent_checkpoint(
Expand Down
2 changes: 2 additions & 0 deletions results/MODEL_TABLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ planner outperforms the same planner on the learned dynamics.

| Environment | Model | Planner | Init | n/arm | Oracle | Learned | CPG | 95% AC CI | Verdict |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| dmc_acrobot_swingup | dreamerv3 | random-shooting | varied | 10 | 0.100 | 0.100 | +0.000 | [-0.298, +0.298] | INCONCLUSIVE |
| dmc_acrobot_swingup | learned MLP | random-shooting | varied | 10 | 0.100 | 0.100 | +0.000 | [-0.298, +0.298] | INCONCLUSIVE |
| dmc_acrobot_swingup | mlp_on_tdmpc2_data | cem | varied | 150 | 0.033 | 0.020 | +0.013 | [-0.027, +0.053] | PLANNER BOTTLENECK |
| dmc_acrobot_swingup | mlp_on_tdmpc2_data | random-shooting | varied | 10 | 0.100 | 0.100 | +0.000 | [-0.298, +0.298] | INCONCLUSIVE |
Expand All @@ -32,6 +33,7 @@ planner outperforms the same planner on the learned dynamics.

## Row sources

- `dmc_acrobot_swingup` / `dreamerv3` / `random-shooting` / varied: `results/dmc_acrobot/dreamerv3_cpg.json`
- `dmc_acrobot_swingup` / `learned MLP` / `random-shooting` / varied: `results/dmc_acrobot/cpg.json`
- `dmc_acrobot_swingup` / `mlp_on_tdmpc2_data` / `cem` / varied: `results/dmc_acrobot/cem_cpg_sweep.json`
- `dmc_acrobot_swingup` / `mlp_on_tdmpc2_data` / `random-shooting` / varied: `results/dmc_acrobot/coverage_mlp_on_tdmpc2_cpg.json`
Expand Down
Loading