Skip to content

v1.5: paper-scale reruns + original-simulator follow-ups #18

Description

@0bserver07

v1.5 — paper-scale reruns + original-simulator follow-ups

Why

Of the 58 v1+v1.5 stubs (RESULTS.md):

  • 32 reproduce paper claims (full or qualitative)
  • 25 partial / qualitative — algorithm works, paper number not fully reached because of laptop/numpy-only budget OR synthetic-data substitution per the SPEC's RL-stub rule
  • 1 honest non-replication (hq-learning-pomdp; HQ-vs-flat gap doesn't reproduce on 29-cell maze; mathematical analysis in §Open questions)

The 25 partials all have honest gaps that should close at paper config or at the original simulator. v1.5 reruns them at proper scale (correct architecture width, correct epoch count, correct dataset size, original simulator) on Modal or a single GPU. Goal: turn the partials into yes-reproductions, OR document them as genuine failures-at-scale.

This issue mirrors hinton-problems #46, adapted for Schmidhuber's lineage.

Scope

Two categories:

Category A: Need bigger budget (paper-scale reruns)

These hit the laptop's 5-min budget at smaller scale than the paper. Need GPU/Modal:

  • mnist-deep-mlp — 535k MLP / 15 epochs → paper 12M weights / 800 epochs, target 0.35% test err
  • mcdnn-image-bench — single-column MLP / 22s → paper 35-column ensemble / 60+ ep × 35 cols, target 0.23%; add GTSRB + CASIA Chinese loaders
  • evolino-sines-mackey-glass — pop=40 / 80 gens / NRMSE@84=0.29 → full ESP / paper budget, target NRMSE@84 ≈ 1.9e-3
  • pipe-6-bit-parity — 6-bit at 71.9% in 240s → paper budget, target 16/16 = 100%
  • lstm-search-space-odyssey — 8 variants on adding-problem T=50 → full TIMIT/IAM/JSB battery at paper config (5,400 experiments)
  • noise-free-long-lag — sub-variant (a) at p=50 only → (b)/(c) variants + p=100/500/1000 sweep
  • timing-counting-spikes — MSD only at T=150 → MSD/GTS/PFG triple at T≥300
  • hq-learning-pomdp — 29-cell maze (no replication) → paper's 62-cell maze; mathematical analysis predicts it should reproduce at scale

Category B: Original-simulator paths for the 8 v1.5 synthetic substitutes

Wave 11 shipped these as numpy mini-environments per the SPEC's RL-stub rule. v1.5 reruns at the original env (will need infra beyond nix develop + numpy):

  • world-models-carracing — currently numpy 2D track → gym CarRacing-v0
  • world-models-vizdoom-dream — currently numpy 5×5 gridworld → VizDoom DoomTakeCover-v0
  • torcs-vision-evolution — currently numpy oval → TORCS racing simulator
  • timit-blstm-ctc — currently synthetic phoneme corpus → TIMIT phoneme set
  • iam-handwriting — currently synthetic 10-char alphabet → IAM-OnDB / IAM-DB
  • em-segmentation-isbi — currently synthetic Voronoi-EM → ISBI 2012 EM stack
  • clockwork-rnn — currently synthetic sum-of-sines → raw-audio TIMIT word
  • (8th synthetic substitute already covered above)

Category C: Architecture/capacity gap (mid-scope)

These need a width-bump or an extra slot but not full paper budget:

  • neural-em-shapes — best test NMI 0.428 at K=3 / H=24 → paper AMI 0.96 with K+1 background slot + GRU M-step
  • relational-nem-bouncing-balls — distribution shift at K=6 → larger train distribution + K-curriculum
  • neural-data-router — +1 depth above chance → paper "100% length-gen" at vocab=8/8 + LayerNorm + alternating L→R/R→L heads
  • self-referential-weight-matrix — 4-way boolean meta-learning at 99.6% → paper's larger sequence-task setup (continuous-pointer relaxation → discrete REINFORCE addresses)
  • compete-to-compute — small-net regime noisy (LWTA wins 6/10) → paper 3-layer × 512 hidden, longer training, larger sequential split

Acceptance

For each stub:

  • New PR with paper-config rerun results
  • Update RESULTS.md row: partialyes (or document genuine failure at scale)
  • Update stub README §Results with paper-scale numbers
  • Cost reported: GPU-hours or $ on Modal

Stub-specific follow-ups (already filed)

  • #3 — nbb-xor (η ablation, multi-subset arch, source verification). Different scope (rate-parameter ablation, not paper-scale).

agent-0bserver07 (Claude Code) on behalf of Yad

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions