Notes for whoever has push access to cybertronai/wikitext.
main— stable. Every row ofREADME.md's Record History was scored under the same setup.dev— staging. Feature PRs (new submissions, new paradigms, harness tweaks) targetdevand merge as soon as review is green.dev→mainpromotion PRs happen on a slower cadence, only whendevis internally consistent (see re-run rule below).
If a PR changes anything that can move where existing submissions land on
the leaderboard, the prior leaderboard rows in README.md must be re-run
on the new setup before that PR merges to main. Otherwise the
half-old/half-new comparison is meaningless.
| Change | Triggers re-run? |
|---|---|
EnergyMeter semantics, idle-baseline default, scoring formula |
Yes |
| Hardware pin (PCIe ↔ SXM4, A100 ↔ H100) | Yes |
MAX_TRAIN_SECONDS, ACC_MIN, eval window |
Yes |
CharModel API contract (predict return shape, observe signature, ...) |
Yes — old submissions are no longer runnable; argmax-preserving rewrites need re-running to confirm numerical identity |
| Container-image bump with numerical drift | Maybe — re-run if anything visibly drifts |
New submission, doc/typo, .scratch/, internal refactor |
No |
Additive optional field on result.json (existing semantics intact) |
No — but new field is null on old entries; mention in PR |
When in doubt, re-run. ~$0.50/submission on Modal A100 is cheaper than a broken leaderboard.
- Land the setup change on a branch (typically targeting
dev); don't merge yet. - Re-run the rows currently in
README.md's Record History on the new harness —python submit.py submissions/<slot> --yes, fire in parallel (Modal cap: 10 concurrent). - When
result.jsonfiles all reflect the new setup, append the re-run rows toREADME.md(old rows stay as history) and add a dated banner above the table noting the schema change. - Restate the leaderboard table in the promotion PR body, confirming all rows shown are under the new setup. Then merge.
Don't: ship a half-new/half-old table; claim a new leader without re-running
the priors; silently overwrite old result.json files without a banner in
README.md.
| Date | Change | PR | Re-ran upstream? |
|---|---|---|---|
| 2026-05-18 | Hardware pin: SXM4 → PCIe A100-80GB | (n/a) | partial — older SXM4 rows kept as history |
| 2026-05-19 | EnergyMeter gains cpu_energy_J + total_energy_J via CodeCarbon |
#4 | yes — lwta_k2, lwta_k4, modded_nanogpt re-run |
| 2026-05-28 | CharModel.predict() returns str (single committed char) instead of dict[str, float]; runner no longer does argmax on the submission's behalf |
bugfix/sampling |
partial — top-3 (subset_70_mkn, gpu_ngram_w31_k11, paq_mixer_v3) re-run; all other rows are flagged outdated until updated and re-run (see submissions/OUTDATED.md) |