v1.5: paper-scale reruns + original-simulator follow-ups

# v1.5 — paper-scale reruns + original-simulator follow-ups

## Why

Of the 58 v1+v1.5 stubs ([RESULTS.md](https://github.com/cybertronai/schmidhuber-problems/blob/main/RESULTS.md)):

- 32 reproduce paper claims (full or qualitative)
- **25 partial / qualitative** — algorithm works, paper number not fully reached because of laptop/numpy-only budget OR synthetic-data substitution per the SPEC's RL-stub rule
- 1 honest non-replication ([`hq-learning-pomdp`](https://github.com/cybertronai/schmidhuber-problems/tree/main/hq-learning-pomdp); HQ-vs-flat gap doesn't reproduce on 29-cell maze; mathematical analysis in §Open questions)

The 25 partials all have honest gaps that should close at paper config or at the original simulator. v1.5 reruns them at proper scale (correct architecture width, correct epoch count, correct dataset size, original simulator) on Modal or a single GPU. Goal: turn the partials into yes-reproductions, OR document them as genuine failures-at-scale.

This issue mirrors [hinton-problems #46](https://github.com/cybertronai/hinton-problems/issues/46), adapted for Schmidhuber's lineage.

## Scope

Two categories:

### Category A: Need bigger budget (paper-scale reruns)

These hit the laptop's 5-min budget at smaller scale than the paper. Need GPU/Modal:

- `mnist-deep-mlp` — 535k MLP / 15 epochs → paper 12M weights / 800 epochs, target 0.35% test err
- `mcdnn-image-bench` — single-column MLP / 22s → paper 35-column ensemble / 60+ ep × 35 cols, target 0.23%; add GTSRB + CASIA Chinese loaders
- `evolino-sines-mackey-glass` — pop=40 / 80 gens / NRMSE@84=0.29 → full ESP / paper budget, target NRMSE@84 ≈ 1.9e-3
- `pipe-6-bit-parity` — 6-bit at 71.9% in 240s → paper budget, target 16/16 = 100%
- `lstm-search-space-odyssey` — 8 variants on adding-problem T=50 → full TIMIT/IAM/JSB battery at paper config (5,400 experiments)
- `noise-free-long-lag` — sub-variant (a) at p=50 only → (b)/(c) variants + p=100/500/1000 sweep
- `timing-counting-spikes` — MSD only at T=150 → MSD/GTS/PFG triple at T≥300
- `hq-learning-pomdp` — 29-cell maze (no replication) → paper's 62-cell maze; mathematical analysis predicts it should reproduce at scale

### Category B: Original-simulator paths for the 8 v1.5 synthetic substitutes

Wave 11 shipped these as numpy mini-environments per the SPEC's RL-stub rule. v1.5 reruns at the original env (will need infra beyond `nix develop` + numpy):

- `world-models-carracing` — currently numpy 2D track → gym CarRacing-v0
- `world-models-vizdoom-dream` — currently numpy 5×5 gridworld → VizDoom DoomTakeCover-v0
- `torcs-vision-evolution` — currently numpy oval → TORCS racing simulator
- `timit-blstm-ctc` — currently synthetic phoneme corpus → TIMIT phoneme set
- `iam-handwriting` — currently synthetic 10-char alphabet → IAM-OnDB / IAM-DB
- `em-segmentation-isbi` — currently synthetic Voronoi-EM → ISBI 2012 EM stack
- `clockwork-rnn` — currently synthetic sum-of-sines → raw-audio TIMIT word
- (8th synthetic substitute already covered above)

### Category C: Architecture/capacity gap (mid-scope)

These need a width-bump or an extra slot but not full paper budget:

- `neural-em-shapes` — best test NMI 0.428 at K=3 / H=24 → paper AMI 0.96 with K+1 background slot + GRU M-step
- `relational-nem-bouncing-balls` — distribution shift at K=6 → larger train distribution + K-curriculum
- `neural-data-router` — +1 depth above chance → paper "100% length-gen" at vocab=8/8 + LayerNorm + alternating L→R/R→L heads
- `self-referential-weight-matrix` — 4-way boolean meta-learning at 99.6% → paper's larger sequence-task setup (continuous-pointer relaxation → discrete REINFORCE addresses)
- `compete-to-compute` — small-net regime noisy (LWTA wins 6/10) → paper 3-layer × 512 hidden, longer training, larger sequential split

## Acceptance

For each stub:
- New PR with paper-config rerun results
- Update `RESULTS.md` row: `partial` → `yes` (or document genuine failure at scale)
- Update stub README §Results with paper-scale numbers
- Cost reported: GPU-hours or $ on Modal

## Stub-specific follow-ups (already filed)

- [#3](https://github.com/cybertronai/schmidhuber-problems/issues/3) — nbb-xor (η ablation, multi-subset arch, source verification). Different scope (rate-parameter ablation, not paper-scale).

---

_agent-0bserver07 (Claude Code) on behalf of Yad_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.5: paper-scale reruns + original-simulator follow-ups #18

v1.5 — paper-scale reruns + original-simulator follow-ups

Why

Scope

Category A: Need bigger budget (paper-scale reruns)

Category B: Original-simulator paths for the 8 v1.5 synthetic substitutes

Category C: Architecture/capacity gap (mid-scope)

Acceptance

Stub-specific follow-ups (already filed)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

v1.5: paper-scale reruns + original-simulator follow-ups #18

Description

v1.5 — paper-scale reruns + original-simulator follow-ups

Why

Scope

Category A: Need bigger budget (paper-scale reruns)

Category B: Original-simulator paths for the 8 v1.5 synthetic substitutes

Category C: Architecture/capacity gap (mid-scope)

Acceptance

Stub-specific follow-ups (already filed)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions