Skip to content

feat(metrics): stdlib rank-correlation primitives (Spearman, Kendall, bootstrap CI)#54

Merged
Denis-hamon merged 1 commit into
mainfrom
feat-correlation-primitives
Jun 14, 2026
Merged

feat(metrics): stdlib rank-correlation primitives (Spearman, Kendall, bootstrap CI)#54
Denis-hamon merged 1 commit into
mainfrom
feat-correlation-primitives

Conversation

@Denis-hamon

Copy link
Copy Markdown
Owner

Tier 2.1.a — the no-GPU foundation for the offline→downstream keystone (does a cheap planner-free offline metric predict control usefulness?). Pure stdlib, no compute, core wmel.metrics only.

What it adds

  • spearman_rho — Pearson on average ranks (tie-safe).
  • kendall_tau — tau-b, tie-corrected, O(n²); robust at very small n.
  • bootstrap_correlation_ciCorrelationResult — paired percentile bootstrap (resamples cell pairs jointly, like paired_bootstrap_gap_ci; skips degenerate resamples; deterministic given seed). Rank-based by design: a handful of cells, monotone-not-linear, incomparable scales.
  • Exported from the package; _rankdata/_pearson private helpers.

Why now

T2.2 (n=150) and T2.3 (pixels) are GPU-gated, but this primitive layer is the planned no-GPU first step of the keystone and is independently useful stats. The remaining keystone pieces — offline_metrics.py (M1 1-step MSE, M2 k-step divergence, M3 action-counterfactual sensitivity, M4 score-to-go error) and correlate.py — need the CPU MLP-variant sweep + the GPU box for the TD-MPC2 cells.

Verification

  • 13 tests, all closed-form: Spearman 0.8 and Kendall 4/6 on [1,2,3,4] vs [1,3,2,4]; tie-corrected tau-b 0.5; average-rank ties; perfect-correlation tight CI; determinism; degenerate + input-validation guards.
  • Adversarial review: GO, zero must-fix — all three hand-derivable values re-derived and cross-checked against SciPy; paired resampling, percentile convention, determinism, and edge guards confirmed. Two doc nits applied (CI is conditional on non-degenerate resamples; n_boot field = valid count).
  • Full suite passing; core stays stdlib-only; no emoji; no new git tag.

🤖 Generated with Claude Code

… bootstrap CI)

The no-GPU foundation for the offline-metric vs downstream-performance keystone
(T2.1.a): correlate a cheap planner-free offline metric against CPG / success
across (model, env, planner) cells and report the strength with an honest
interval. Pure stdlib, no compute.

- src/wmel/metrics.py: spearman_rho (Pearson on average ranks, tie-safe),
  kendall_tau (tau-b, tie-corrected, O(n^2)), bootstrap_correlation_ci
  (paired percentile bootstrap; resamples cell pairs jointly; skips degenerate
  resamples; deterministic given seed), CorrelationResult dataclass, plus
  private _rankdata / _pearson. Exported from the package.
- tests/test_correlation.py: closed-form checks (Spearman 0.8 and Kendall 4/6
  on [1,2,3,4] vs [1,3,2,4]; tie-corrected tau-b 0.5 on [1,1,2] vs [1,2,2];
  average-rank ties), perfect-correlation tight CI, determinism, degenerate
  and input-validation guards.

Rank-based by design (small n, monotone-not-linear, incomparable scales); the
bootstrap CI is documented as conditional on non-degenerate resamples.

Adversarial review GO, zero must-fix (all three hand-derivable values
re-derived and cross-checked vs SciPy; paired resampling, percentile
convention matching paired_bootstrap_gap_ci, determinism, and edge guards all
confirmed). Full suite passing, core stdlib-only. No new git tag.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Denis-hamon Denis-hamon merged commit e3c7128 into main Jun 14, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant