Skip to content

3-class strength GP + joint_hamming_matern production kernel#29

Open
SebastianAment wants to merge 1 commit into
mainfrom
joint-hamming-matern
Open

3-class strength GP + joint_hamming_matern production kernel#29
SebastianAment wants to merge 1 commit into
mainfrom
joint-hamming-matern

Conversation

@SebastianAment

Copy link
Copy Markdown
Contributor

Migrates the strength GP from the prior 2-class (pooled mortar+Set-2, contaminated Set-3) dataset and hamming categorical source kernel to a corrected 3-class dataset and a new joint-distance source kernel joint_hamming_matern that combines per-feature ARD with a single learnable Hamming penalty inside one Matérn-3/2 radial basis.

Production rationale

(Full ablation deferred to a research-detail follow-up commit.)

  • The previously-deployed model treated Material Source as a continuous numeric coordinate — mathematically mis-specified for a 3-class categorical, but it preserved the joint kernel topology that factorised categorical-product kernels lose. The new joint_hamming_matern recovers the joint topology while keeping correct categorical handling.

  • Strictly dominates the previous hamming production default on every in-distribution metric:

    metric prior new
    LOO RMSE 510 495 psi
    bLOO RMSE 738 717 psi
    bLOO PIT-KS 0.039 0.030

    while passing all pre-registered LOCO acceptance criteria (every held-out class within 110% of Hamming's LOCO RMSE).

  • Seed-deterministic across all metrics — matches the prior production model's reproducibility and beats IndexKernel's seed sensitivity.

Changes

Data

  • data/boxcrete_data.csv: three-class merge with canonical mix-name table (boxcrete/_mix_naming_table.csv, boxcrete/mix_naming.py). Splits 12 corruption-collision mix names into 24 canonical names per actual recipe; corrects mortar/concrete misroutes; drops 3 clay-using mortars (M75/M76/M77) the GP feature set cannot represent. Strength values and Strength (Std) for the 638 common (composition + temp + time) tuples are byte-identical to the prior dataset; only labels and partition differ.
  • test/fixtures/boxcrete_data_pre_v5.csv: legacy fixture preserved under its established filename for regression / no-regression checks against the previous data.
  • boxcrete/utils.py: Material Source bounds bumped to (0, 2); added per-class GWP coefficients for Class 2 (Set-3 Amrize cement / Class F fly ash concrete) — derived by scripts/derive_class_2_gwp.py from the new rows.

Kernel framework (boxcrete/kernels.py)

  • Replaced the source-as-continuous-numeric kernel with a categorical-kernel switch _categorical_source_branch. Production variant: joint_hamming_matern. The codebase also registers research variants (rbf_embedding_d{1,2,3}, joint_chain_matern, joint_hamming_matern_nu{05,25}, joint_embedding_matern_d{1,2,3}, additive_*) as alternative ablation points; their detailed evaluation lives in the follow-up research commit.

  • New JointHammingMaternKernel (production source kernel):

    K(z_i, z_j) = σ² · M_{3/2}(√( Σ_f Δx_f² / ℓ_f² + α · 𝟙[c_i ≠ c_j] ))

    One learnable categorical penalty α plus per-feature ARD; the Hamming penalty applies only to the Material Source dim. Closes the +33 psi bLOO architectural gap to legacy_continuous_ard vs the simpler categorical-product topology, while keeping the LOCO advantages of proper categorical handling.

  • Both JointHammingMaternKernel and JointEmbeddingMaternKernel accept a lengthscale_prior kwarg and register it as a named prior on raw_feat_lengthscale (mirroring GPyTorch's standard MaternKernel(lengthscale_prior=...) contract). _categorical_source_branch passes within_group_prior(...) to the joint kernels so Cement/FlyAsh/Slag binder lengthscales and Fine/Coarse Aggregate lengthscales stay tied to within max/min < 1.05 (the same tying behaviour the prior Hamming production fit had).

  • DEFAULT_SOURCE_KERNEL = "joint_hamming_matern" (was "hamming").

  • GATE_TAU = 0.10 (was 0.05). The time gate h(t) = 1 - exp(-t / GATE_TAU) saturates ~3× more slowly, lengthening its monotonic envelope into the pre-1-day extrapolation region. Reduces the fraction of catalog compositions with non-monotonic strength curves from ~33% to ~22% with a slight LOO improvement.

Priors (boxcrete/priors.py)

  • LogNormal ARD prior on non-grouped lengthscales prevents the lengthscale railing observed in the prior production fit (max blind ℓ ≤ 100).
  • Within-group prior ties Cement/FA/Slag and Fine/Coarse Aggregate lengthscales for binder and aggregate parsimony.

JS port (docs/gp.mjs + docs/gp_v2_fast.mjs)

  • Added jointHammingMatern radial-basis function supporting ν ∈ {0.5, 1.5, 2.5} and the Hamming categorical penalty α · 𝟙[c_i ≠ c_j] inside the Matern distance argument.
  • Updated the kernel(x1, x2, params) dispatcher and the WASM-accelerated batched fast path to switch on params.matern_specific.source_kernel_kind. Both paths add the Hamming penalty before applying the radial basis, so single-point and batched predictors agree to FP-summation order (verified by test_js_predictor_parity.mjs).

UI (docs/ui.mjs + docs/model/compositions.json)

  • Material Source panel now renders N buttons (one per class) via a loop over the slider_bounds["Material Source"] range, instead of hardcoded 2-button "Source A / Source B". Class labels (Set 1 / Set 2 / Set 3) come from a MATERIAL_SOURCE_LABELS map; adding a 4th class only requires extending the bounds + the labels map.
  • compositions.json::slider_bounds["Material Source"] updated from max=1 (stale binary labelling) to max=2 (3-class).
  • Material Source ingredient-insight panel contains a concise per-class summary (Set 1 mortar / Set 2 Heidelberg Class C / Set 3 Amrize Class F) with a reference to docs/materials_background.md for the full per-class plant / source / HRWR detail.

Production artifacts

Regenerated from the new kernel; all ground-truth-anchored to data/boxcrete_data.csv:

  • docs/model/strength.jsonschema_version=3, joint_hamming_matern, α=0.536, ν=1.5, within-group prior installed on the joint kernel (verified by test_lengthscale_identifiability tying the binder / aggregate lengthscales within max/min < 1.05).
  • docs/model/gwp.json — 3 classes (previously had 0 and 1 only).
  • docs/model/compositions.json — correct Material Source labels (61 mortar / 30 Set-2 / 53 Set-3) looked up against the canonical raw data; refreshed strength_predictions, gwp_predictions, pareto_mask.
  • docs/model/test_vectors.json — 37 vectors.
  • docs/model/strength_model.pt.
  • docs/model/mix_analyses.json — migrated 138 LLM-authored mix narratives from the prior Source A/B labels to Set 1 / Set 3.
  • Notebooks (notebooks/{prediction_and_optimization_tutorial, strength_curve_prediction_demo, slump_prediction_demo}.ipynb): re-executed with the new data; 0 references to the removed MRWR (kg/m3) column remain.

Regeneration pipeline (experiments/regenerate_*)

  • regenerate_strength_json.py — extended introspection to handle the new JointHammingMaternKernel layout (raw_feat_lengthscale, raw_alpha, nu, categorical_mode, feature_dims, source_dim).
  • regenerate_gwp_json.py (new) — regenerates gwp.json from DEFAULT_GWP_COEFFICIENTS to include all 3 classes.
  • regenerate_compositions_gwp_predictions.mjs (new) — refreshes compositions.gwp_predictions after any gwp.json change.
  • fix_compositions_material_source.py (new) — recovery script for any future drift between catalog-stored class labels and the canonical raw data; looks up each catalog composition's class by 7-column composition fingerprint.
  • regenerate_all_artifacts.sh — pipeline orchestrator updated to include the new GWP step + the new test_catalog_consistency JS test.
  • docs/generate_mix_analyses.py fallback template — updated from binary labels to 3-class labels.

Tests (production-essential)

  • test/test_joint_hamming_matern_kernel.py — 14 tests for the new kernel: PSD, Hamming penalty correctness, Matern smoothness, α gradient flow, gauge-fix invariance, etc.
  • test/test_mix_naming.py — canonical-naming pipeline (38 tests).
  • test/test_composed_prior.py — within-group prior composition (11 tests).
  • test/test_kernel_layout.py — updated for the new ProductKernel + joint-kernel layouts. The prior test assumed ScaleKernel(MaternKernel) only; broke when the categorical wrapper was introduced. Restored regression coverage for all registered source-kernel variants.
  • test/test_lengthscale_identifiability.py — updated to accept the joint-kernel layout (specific lengthscales have length 16 for joint kernels, 17 for the legacy continuous-source layout) and to walk named_priors() for kernels without a standard .lengthscale_prior attribute.

Regression tests for newly-discovered bug-classes

Each test below targets a class of bug that pre-existed silently in the prior test suite and is now caught at CI time:

  1. test/test_catalog_consistency.mjs (new, 8 assertions): cross-validates docs/model/compositions.json against data/boxcrete_data.csv. Anchors to ground truth instead of merely checking internal catalog ↔ predictions consistency. Catches:
    • per-row Material Source mislabel,
    • missing classes in catalog,
    • stale slider_bounds,
    • out-of-range Material Source values.
  2. test/test_variance_orientation.py (new, 1 test, ~60 s): asserts σ²(observed_class) ≤ σ²(any other class) for every training row. Direct coverage of the symptom that "switching class in the explorer makes uncertainty contract for a non-observed class" — fires on any future catalog-vs-data mislabelling OR on any kernel mis-specification that produces this counter-intuitive behaviour.
  3. test/test_data_freshness.mjs: replaced the legacy binary comp[7] >= 0.5 ? 1 : 0 Material Source collapse with the same Math.round(comp[gwpParams.class_dim]) lookup the explorer uses. The previous binary collapse silently mapped class 2 to class 1, producing wrong GWP coefficients and a green test on wrong values.
  4. test/test_js_ui_smoke.mjs: same fix to the same legacy binary collapse (also silently mapped class 2 → class 1).

Test status

  • Python: ~300 PASS / 0 fail. Covers kernel + identifiability + mix naming + composed prior + layout + curve monotonicity + pretrained-loader fidelity + variance orientation + the rest of the inheritance suite from the previous PR.
  • JS: 10 / 10 PASS:
    • GP parity (296 assertions),
    • predictor parity (8),
    • physical constraints (37),
    • UI smoke (144 × 64),
    • catalog consistency (8),
    • data freshness (6),
    • curve monotonicity (144),
    • feature parity (49),
    • strength architecture,
    • units.
  • Lint: black --check clean; flake8 critical errors (E9, F63, F7, F82) = 0.

Follow-up

A separate research-detail commit (lands as a sibling) carries the non-essential experimental artefacts:

  • experiments/THREE_CLASS_AND_PRIOR_BENCHMARK.md — full ablation writeup with literature context, including the ablation series that motivated GATE_TAU=0.10, the time-only kernel choice, and the deployed-baseline-vs-new comparison.
  • experiments/three_class_ablation.py + experiments/model_variant_study.py — ablation infrastructure.
  • Tests for the registered research kernel variants (test_joint_embedding_matern_kernel.py, test_rbf_embedding_kernel.py).
  • Mobile-performance investigation doc + PR-prep meta-doc.
  • Legacy ablation infrastructure restored from the prior PR baseline.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026
@SebastianAment SebastianAment force-pushed the joint-hamming-matern branch 2 times, most recently from f12e394 to 5b93c20 Compare June 17, 2026 21:18
Migrates the strength GP from the prior 2-class (pooled mortar+Set-2,
contaminated Set-3) dataset and `hamming` categorical source kernel to
a corrected 3-class dataset and a new joint-distance source kernel
`joint_hamming_matern` that combines per-feature ARD with a single
learnable Hamming penalty inside one Matérn-3/2 radial basis.

## Production rationale

(Full ablation deferred to a research-detail follow-up commit.)

- The **previously-deployed model** treated Material Source as a
  continuous numeric coordinate — mathematically mis-specified for a
  3-class categorical, but it preserved the joint kernel topology
  that factorised categorical-product kernels lose. The new
  `joint_hamming_matern` recovers the joint topology while keeping
  correct categorical handling.
- **Strictly dominates the previous `hamming` production default**
  on every in-distribution metric:

  | metric        | prior  | new    |
  |---            |---:    |---:    |
  | LOO RMSE      | 510    | **495** psi |
  | bLOO RMSE     | 738    | **717** psi |
  | bLOO PIT-KS   | 0.039  | **0.030** |

  while passing all pre-registered LOCO acceptance criteria (every
  held-out class within 110% of Hamming's LOCO RMSE).
- **Seed-deterministic across all metrics** — matches the prior
  production model's reproducibility and beats IndexKernel's seed
  sensitivity.

## Changes

### Data

- `data/boxcrete_data.csv`: three-class merge with canonical mix-name
  table (`boxcrete/_mix_naming_table.csv`, `boxcrete/mix_naming.py`).
  Splits 12 corruption-collision mix names into 24 canonical names per
  actual recipe; corrects mortar/concrete misroutes; drops 3
  clay-using mortars (`M75`/`M76`/`M77`) the GP feature set cannot
  represent. Strength values and `Strength (Std)` for the 638 common
  `(composition + temp + time)` tuples are byte-identical to the
  prior dataset; only labels and partition differ.
- `test/fixtures/boxcrete_data_pre_v5.csv`: legacy fixture preserved
  under its established filename for regression / no-regression
  checks against the previous data.
- `boxcrete/utils.py`: `Material Source` bounds bumped to `(0, 2)`;
  added per-class GWP coefficients for Class 2 (Set-3 Amrize cement /
  Class F fly ash concrete) — derived by
  `scripts/derive_class_2_gwp.py` from the new rows.

### Kernel framework (`boxcrete/kernels.py`)

- Replaced the source-as-continuous-numeric kernel with a
  categorical-kernel switch `_categorical_source_branch`. Production
  variant: `joint_hamming_matern`. The codebase also registers
  research variants (`rbf_embedding_d{1,2,3}`,
  `joint_chain_matern`, `joint_hamming_matern_nu{05,25}`,
  `joint_embedding_matern_d{1,2,3}`, `additive_*`) as alternative
  ablation points; their detailed evaluation lives in the follow-up
  research commit.
- New `JointHammingMaternKernel` (production source kernel):

  ```
  K(z_i, z_j) = σ² · M_{3/2}(√( Σ_f Δx_f² / ℓ_f²  +  α · 𝟙[c_i ≠ c_j] ))
  ```

  One learnable categorical penalty `α` plus per-feature ARD; the
  Hamming penalty applies **only** to the `Material Source` dim.
  Closes the +33 psi bLOO architectural gap to
  `legacy_continuous_ard` vs the simpler categorical-product
  topology, while keeping the LOCO advantages of proper categorical
  handling.
- Both `JointHammingMaternKernel` and `JointEmbeddingMaternKernel`
  accept a `lengthscale_prior` kwarg and register it as a named
  prior on `raw_feat_lengthscale` (mirroring GPyTorch's standard
  `MaternKernel(lengthscale_prior=...)` contract).
  `_categorical_source_branch` passes `within_group_prior(...)` to
  the joint kernels so Cement/FlyAsh/Slag binder lengthscales and
  Fine/Coarse Aggregate lengthscales stay tied to within `max/min <
  1.05` (the same tying behaviour the prior Hamming production fit
  had).
- `DEFAULT_SOURCE_KERNEL = "joint_hamming_matern"` (was `"hamming"`).
- `GATE_TAU = 0.10` (was `0.05`). The time gate
  `h(t) = 1 - exp(-t / GATE_TAU)` saturates ~3× more slowly,
  lengthening its monotonic envelope into the pre-1-day
  extrapolation region. Reduces the fraction of catalog
  compositions with non-monotonic strength curves from ~33% to ~22%
  with a slight LOO improvement.

### Priors (`boxcrete/priors.py`)

- LogNormal ARD prior on non-grouped lengthscales prevents the
  lengthscale railing observed in the prior production fit
  (`max blind ℓ ≤ 100`).
- Within-group prior ties Cement/FA/Slag and Fine/Coarse Aggregate
  lengthscales for binder and aggregate parsimony.

### JS port (`docs/gp.mjs` + `docs/gp_v2_fast.mjs`)

- Added `jointHammingMatern` radial-basis function supporting
  `ν ∈ {0.5, 1.5, 2.5}` and the Hamming categorical penalty
  `α · 𝟙[c_i ≠ c_j]` inside the Matern distance argument.
- Updated the `kernel(x1, x2, params)` dispatcher and the
  WASM-accelerated batched fast path to switch on
  `params.matern_specific.source_kernel_kind`. Both paths add the
  Hamming penalty before applying the radial basis, so single-point
  and batched predictors agree to FP-summation order (verified by
  `test_js_predictor_parity.mjs`).

### UI (`docs/ui.mjs` + `docs/model/compositions.json`)

- `Material Source` panel now renders **N buttons (one per class)**
  via a loop over the `slider_bounds["Material Source"]` range,
  instead of hardcoded 2-button "Source A / Source B". Class
  labels (`Set 1` / `Set 2` / `Set 3`) come from a
  `MATERIAL_SOURCE_LABELS` map; adding a 4th class only requires
  extending the bounds + the labels map.
- `compositions.json::slider_bounds["Material Source"]` updated from
  `max=1` (stale binary labelling) to `max=2` (3-class).
- `Material Source` ingredient-insight panel contains a concise
  per-class summary (Set 1 mortar / Set 2 Heidelberg Class C / Set 3
  Amrize Class F) with a reference to `docs/materials_background.md`
  for the full per-class plant / source / HRWR detail.

### Production artifacts

Regenerated from the new kernel; all ground-truth-anchored to
`data/boxcrete_data.csv`:

- `docs/model/strength.json` — `schema_version=3`,
  `joint_hamming_matern`, `α=0.536`, `ν=1.5`, within-group prior
  installed on the joint kernel (verified by
  `test_lengthscale_identifiability` tying the binder / aggregate
  lengthscales within `max/min < 1.05`).
- `docs/model/gwp.json` — 3 classes (previously had 0 and 1 only).
- `docs/model/compositions.json` — correct `Material Source` labels
  (61 mortar / 30 Set-2 / 53 Set-3) looked up against the canonical
  raw data; refreshed `strength_predictions`, `gwp_predictions`,
  `pareto_mask`.
- `docs/model/test_vectors.json` — 37 vectors.
- `docs/model/strength_model.pt`.
- `docs/model/mix_analyses.json` — migrated 138 LLM-authored mix
  narratives from the prior `Source A/B` labels to `Set 1` / `Set 3`.
- Notebooks (`notebooks/{prediction_and_optimization_tutorial,
  strength_curve_prediction_demo, slump_prediction_demo}.ipynb`):
  re-executed with the new data; 0 references to the removed
  `MRWR (kg/m3)` column remain.

### Regeneration pipeline (`experiments/regenerate_*`)

- `regenerate_strength_json.py` — extended introspection to handle
  the new `JointHammingMaternKernel` layout (`raw_feat_lengthscale`,
  `raw_alpha`, `nu`, `categorical_mode`, `feature_dims`,
  `source_dim`).
- `regenerate_gwp_json.py` (new) — regenerates `gwp.json` from
  `DEFAULT_GWP_COEFFICIENTS` to include all 3 classes.
- `regenerate_compositions_gwp_predictions.mjs` (new) — refreshes
  `compositions.gwp_predictions` after any `gwp.json` change.
- `fix_compositions_material_source.py` (new) — recovery script for
  any future drift between catalog-stored class labels and the
  canonical raw data; looks up each catalog composition's class by
  7-column composition fingerprint.
- `regenerate_all_artifacts.sh` — pipeline orchestrator updated to
  include the new GWP step + the new `test_catalog_consistency` JS
  test.
- `docs/generate_mix_analyses.py` fallback template — updated from
  binary labels to 3-class labels.

### Tests (production-essential)

- `test/test_joint_hamming_matern_kernel.py` — 14 tests for the new
  kernel: PSD, Hamming penalty correctness, Matern smoothness,
  `α` gradient flow, gauge-fix invariance, etc.
- `test/test_mix_naming.py` — canonical-naming pipeline (38 tests).
- `test/test_composed_prior.py` — within-group prior composition
  (11 tests).
- `test/test_kernel_layout.py` — updated for the new ProductKernel +
  joint-kernel layouts. The prior test assumed
  `ScaleKernel(MaternKernel)` only; broke when the categorical
  wrapper was introduced. Restored regression coverage for all
  registered source-kernel variants.
- `test/test_lengthscale_identifiability.py` — updated to accept the
  joint-kernel layout (specific lengthscales have length 16 for
  joint kernels, 17 for the legacy continuous-source layout) and
  to walk `named_priors()` for kernels without a standard
  `.lengthscale_prior` attribute.

### Regression tests for newly-discovered bug-classes

Each test below targets a class of bug that pre-existed silently
in the prior test suite and is now caught at CI time:

1. `test/test_catalog_consistency.mjs` (new, 8 assertions):
   cross-validates `docs/model/compositions.json` against
   `data/boxcrete_data.csv`. Anchors to ground truth instead of
   merely checking internal `catalog ↔ predictions` consistency.
   Catches:
   - per-row `Material Source` mislabel,
   - missing classes in catalog,
   - stale `slider_bounds`,
   - out-of-range `Material Source` values.
2. `test/test_variance_orientation.py` (new, 1 test, ~60 s):
   asserts `σ²(observed_class) ≤ σ²(any other class)` for every
   training row. Direct coverage of the symptom that "switching
   class in the explorer makes uncertainty contract for a
   non-observed class" — fires on any future catalog-vs-data
   mislabelling **OR** on any kernel mis-specification that
   produces this counter-intuitive behaviour.
3. `test/test_data_freshness.mjs`: replaced the legacy binary
   `comp[7] >= 0.5 ? 1 : 0` `Material Source` collapse with the
   same `Math.round(comp[gwpParams.class_dim])` lookup the
   explorer uses. The previous binary collapse silently mapped
   class 2 to class 1, producing wrong GWP coefficients and a
   green test on wrong values.
4. `test/test_js_ui_smoke.mjs`: same fix to the same legacy binary
   collapse (also silently mapped class 2 → class 1).

## Test status

- **Python**: ~300 PASS / 0 fail. Covers kernel + identifiability +
  mix naming + composed prior + layout + curve monotonicity +
  pretrained-loader fidelity + variance orientation + the rest of
  the inheritance suite from the previous PR.
- **JS**: 10 / 10 PASS:
  - GP parity (296 assertions),
  - predictor parity (8),
  - physical constraints (37),
  - UI smoke (144 × 64),
  - catalog consistency (8),
  - data freshness (6),
  - curve monotonicity (144),
  - feature parity (49),
  - strength architecture,
  - units.
- **Lint**: `black --check` clean; flake8 critical errors
  (`E9, F63, F7, F82`) = 0.

## Follow-up

A separate research-detail commit (lands as a sibling) carries the
non-essential experimental artefacts:

- `experiments/THREE_CLASS_AND_PRIOR_BENCHMARK.md` — full ablation
  writeup with literature context, including the ablation series
  that motivated `GATE_TAU=0.10`, the time-only kernel choice, and
  the deployed-baseline-vs-new comparison.
- `experiments/three_class_ablation.py` +
  `experiments/model_variant_study.py` — ablation infrastructure.
- Tests for the registered research kernel variants
  (`test_joint_embedding_matern_kernel.py`,
  `test_rbf_embedding_kernel.py`).
- Mobile-performance investigation doc + PR-prep meta-doc.
- Legacy ablation infrastructure restored from the prior PR
  baseline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant