Skip to content

experiments: observation-richness robustness of the CPG verdict (reframed T2.3)#57

Merged
Denis-hamon merged 1 commit into
mainfrom
feat-obs-robustness
Jun 14, 2026
Merged

experiments: observation-richness robustness of the CPG verdict (reframed T2.3)#57
Denis-hamon merged 1 commit into
mainfrom
feat-obs-robustness

Conversation

@Denis-hamon

Copy link
Copy Markdown
Owner

What

Reframed T2.3. Does prediction != decision survive when the world model no longer sees the clean, privileged low-dimensional state, or is the dissociation an artifact of that tidy observation?

The roadmap's original T2.3 was "one DMC-from-pixels task." A pixel / field-of-view evaluation axis is the signature of the DINO-WM / stable-worldmodel agenda, and the lab keeps a deliberate non-affiliation guardrail against introducing that axis (GPU_ROADMAP.md:92). So T2.3 is reframed to deliver the same scientific payload with nuisance-augmented state instead of pixels: no pixel axis, no JEPA-adjacent baselines, same question answered.

Construction

observation_augmentation.py (pure, stdlib-only) wraps any BenchmarkEnvironment so the observation is true_state ++ nuisance:

  • the simulator stays the oracle — the augmented oracle steps true physics on the state slice and reproduces the nuisance exactly, so it reproduces the augmented env step to the same precision the base oracle reproduces the base env;
  • the planner scores only the true-state slice, so decisions depend on real physics, never on the nuisance;
  • the learned MLP trains on the augmented observation, so its one-step error (the keystone's M1 foil) is moved by the nuisance design while the closed-loop gap should not move.

Two nuisance kinds, both deterministic state features (hence reconstructable by the oracle), differing only in one-step learnability:

kind nuisance expected one-step MSE
redundant smooth low-frequency tanh(state) deflates (easy)
high_freq sin(K*state), K large inflates (a finite smooth MLP cannot resolve K)

This is a genuine one-step difficulty — the quantity M1 measures. A chaotic temporal map would not do: its one-step update is a fittable parabola and its divergence is multi-step only.

Falsifiable claim

CPG and mse_state stay ~flat across nuisance kind/width while mse_total moves (down for redundant, up for high_freq); width=0 is the no-nuisance control reproducing baseline Cartpole CPG. That would show a popular "model quality" proxy — observation-space one-step MSE — is movable by task-irrelevant observation design without touching the closed-loop verdict. If CPG shifts materially, the verdict is observation-form-dependent — itself a finding. Directions are measured per cell, not assumed (the mse_total/mse_state/mse_nuisance split is reported per cell); the input-distraction confound is disclosed and surfaced via the separate mse_state.

Testing / compute

  • CPU only, no GPU, no checkpoints.
  • Pure algebra unit-tested with a synthetic env (oracle reproduces the augmented env exactly; score ignores nuisance; MSE split isolates state vs nuisance); full suite green.
  • Smoke run confirms both directions empirically: redundant w=4 mse_total 0.029 < baseline 0.060; high_freq w=4 mse_total 0.169 > baseline; mse_state stays in a tight band.
  • Adversarial review: initial NO-GO caught a real error — a chaotic temporal distractor's one-step update is a fittable parabola, so it would have deflated (not inflated) the one-step metric. Replaced with the high-frequency state feature; re-review GO, no remaining must-fixes.

Core wmel untouched (everything under experiments/). No new git tag.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

…amed T2.3)

Does prediction != decision survive when the world model no longer sees the
clean, privileged low-dimensional state? The original T2.3 was "one
DMC-from-pixels task", but a pixel/field-of-view axis is the signature of the
DINO-WM / stable-worldmodel agenda and is held off by the lab's non-affiliation
guardrail (experiments/GPU_ROADMAP.md:92). This delivers the same payload with
nuisance-augmented state instead of pixels: no pixel axis, same question.

observation_augmentation.py (pure, stdlib-only) wraps any BenchmarkEnvironment
so the observation is true_state ++ nuisance. The simulator stays the oracle
(the augmented oracle steps true physics on the state slice and reproduces the
nuisance exactly, so it reproduces the augmented env step to the same precision
the base oracle reproduces the base env); the planner scores ONLY the state
slice, so decisions depend on real physics, not the nuisance; the learned MLP
trains on the augmented observation, so its one-step error (the keystone's M1
foil) is moved up or down by the nuisance design while the closed-loop gap
should not move.

Two nuisance kinds, both deterministic state features (hence reconstructable by
the oracle), differing only in one-step learnability: redundant = smooth
low-frequency tanh features (easy -> expected to deflate one-step MSE);
high_freq = sin(K*state) with K large (a finite smooth MLP cannot resolve the
frequency -> expected to inflate it). This is a genuine one-step difficulty, not
a temporal one -- a chaotic map's one-step update is a fittable parabola and its
divergence is multi-step only, which this metric does not compute.

Falsifiable claim: CPG and mse_state stay ~flat across nuisance kind/width while
mse_total moves; width=0 is the no-nuisance control reproducing baseline CPG.
Directions are measured per cell, not assumed (the mse_total/mse_state/
mse_nuisance split is reported for every cell), and the input-distraction
confound (nuisance dims are also MLP inputs) is disclosed and surfaced via the
separate mse_state.

CPU only, no GPU, no checkpoints. The pure augmentation algebra is unit-tested
with a synthetic env (oracle reproduces the augmented env exactly; score ignores
nuisance; MSE split isolates state vs nuisance); the smoke run confirms both
directions (redundant deflates, high_freq inflates) and that mse_state stays in
a tight band. Core wmel untouched (everything under experiments/).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Denis-hamon Denis-hamon merged commit e9310cf into main Jun 14, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant