experiments: observation-richness robustness of the CPG verdict (reframed T2.3)#57
Merged
Conversation
…amed T2.3) Does prediction != decision survive when the world model no longer sees the clean, privileged low-dimensional state? The original T2.3 was "one DMC-from-pixels task", but a pixel/field-of-view axis is the signature of the DINO-WM / stable-worldmodel agenda and is held off by the lab's non-affiliation guardrail (experiments/GPU_ROADMAP.md:92). This delivers the same payload with nuisance-augmented state instead of pixels: no pixel axis, same question. observation_augmentation.py (pure, stdlib-only) wraps any BenchmarkEnvironment so the observation is true_state ++ nuisance. The simulator stays the oracle (the augmented oracle steps true physics on the state slice and reproduces the nuisance exactly, so it reproduces the augmented env step to the same precision the base oracle reproduces the base env); the planner scores ONLY the state slice, so decisions depend on real physics, not the nuisance; the learned MLP trains on the augmented observation, so its one-step error (the keystone's M1 foil) is moved up or down by the nuisance design while the closed-loop gap should not move. Two nuisance kinds, both deterministic state features (hence reconstructable by the oracle), differing only in one-step learnability: redundant = smooth low-frequency tanh features (easy -> expected to deflate one-step MSE); high_freq = sin(K*state) with K large (a finite smooth MLP cannot resolve the frequency -> expected to inflate it). This is a genuine one-step difficulty, not a temporal one -- a chaotic map's one-step update is a fittable parabola and its divergence is multi-step only, which this metric does not compute. Falsifiable claim: CPG and mse_state stay ~flat across nuisance kind/width while mse_total moves; width=0 is the no-nuisance control reproducing baseline CPG. Directions are measured per cell, not assumed (the mse_total/mse_state/ mse_nuisance split is reported for every cell), and the input-distraction confound (nuisance dims are also MLP inputs) is disclosed and surfaced via the separate mse_state. CPU only, no GPU, no checkpoints. The pure augmentation algebra is unit-tested with a synthetic env (oracle reproduces the augmented env exactly; score ignores nuisance; MSE split isolates state vs nuisance); the smoke run confirms both directions (redundant deflates, high_freq inflates) and that mse_state stays in a tight band. Core wmel untouched (everything under experiments/). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Reframed T2.3. Does prediction != decision survive when the world model no longer sees the clean, privileged low-dimensional state, or is the dissociation an artifact of that tidy observation?
The roadmap's original T2.3 was "one DMC-from-pixels task." A pixel / field-of-view evaluation axis is the signature of the DINO-WM / stable-worldmodel agenda, and the lab keeps a deliberate non-affiliation guardrail against introducing that axis (GPU_ROADMAP.md:92). So T2.3 is reframed to deliver the same scientific payload with nuisance-augmented state instead of pixels: no pixel axis, no JEPA-adjacent baselines, same question answered.
Construction
observation_augmentation.py (pure, stdlib-only) wraps any
BenchmarkEnvironmentso the observation istrue_state ++ nuisance:Two nuisance kinds, both deterministic state features (hence reconstructable by the oracle), differing only in one-step learnability:
redundanttanh(state)high_freqsin(K*state), K largeThis is a genuine one-step difficulty — the quantity M1 measures. A chaotic temporal map would not do: its one-step update is a fittable parabola and its divergence is multi-step only.
Falsifiable claim
CPG and
mse_statestay ~flat across nuisance kind/width whilemse_totalmoves (down forredundant, up forhigh_freq);width=0is the no-nuisance control reproducing baseline Cartpole CPG. That would show a popular "model quality" proxy — observation-space one-step MSE — is movable by task-irrelevant observation design without touching the closed-loop verdict. If CPG shifts materially, the verdict is observation-form-dependent — itself a finding. Directions are measured per cell, not assumed (themse_total/mse_state/mse_nuisancesplit is reported per cell); the input-distraction confound is disclosed and surfaced via the separatemse_state.Testing / compute
redundantw=4 mse_total 0.029 < baseline 0.060;high_freqw=4 mse_total 0.169 > baseline;mse_statestays in a tight band.Core
wmeluntouched (everything underexperiments/). No new git tag.Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com