3-class strength GP + joint_hamming_matern production kernel by SebastianAment · Pull Request #29 · facebookresearch/SustainableConcrete

SebastianAment · 2026-06-17T19:36:31Z

Migrates the strength GP from the prior 2-class (pooled mortar+Set-2, contaminated Set-3) dataset and hamming categorical source kernel to a corrected 3-class dataset and a new joint-distance source kernel joint_hamming_matern that combines per-feature ARD with a single learnable Hamming penalty inside one Matérn-3/2 radial basis.

Production rationale

(Full ablation deferred to a research-detail follow-up commit.)

The previously-deployed model treated Material Source as a continuous numeric coordinate — mathematically mis-specified for a 3-class categorical, but it preserved the joint kernel topology that factorised categorical-product kernels lose. The new joint_hamming_matern recovers the joint topology while keeping correct categorical handling.
Strictly dominates the previous hamming production default on every in-distribution metric:

metric prior new

LOO RMSE 510 495 psi

bLOO RMSE 738 717 psi

bLOO PIT-KS 0.039 0.030

while passing all pre-registered LOCO acceptance criteria (every held-out class within 110% of Hamming's LOCO RMSE).
Seed-deterministic across all metrics — matches the prior production model's reproducibility and beats IndexKernel's seed sensitivity.

Changes

Data

data/boxcrete_data.csv: three-class merge with canonical mix-name table (boxcrete/_mix_naming_table.csv, boxcrete/mix_naming.py). Splits 12 corruption-collision mix names into 24 canonical names per actual recipe; corrects mortar/concrete misroutes; drops 3 clay-using mortars (M75/M76/M77) the GP feature set cannot represent. Strength values and Strength (Std) for the 638 common (composition + temp + time) tuples are byte-identical to the prior dataset; only labels and partition differ.
test/fixtures/boxcrete_data_pre_v5.csv: legacy fixture preserved under its established filename for regression / no-regression checks against the previous data.
boxcrete/utils.py: Material Source bounds bumped to (0, 2); added per-class GWP coefficients for Class 2 (Set-3 Amrize cement / Class F fly ash concrete) — derived by scripts/derive_class_2_gwp.py from the new rows.

Kernel framework (`boxcrete/kernels.py`)

Replaced the source-as-continuous-numeric kernel with a categorical-kernel switch _categorical_source_branch. Production variant: joint_hamming_matern. The codebase also registers research variants (rbf_embedding_d{1,2,3}, joint_chain_matern, joint_hamming_matern_nu{05,25}, joint_embedding_matern_d{1,2,3}, additive_*) as alternative ablation points; their detailed evaluation lives in the follow-up research commit.
New JointHammingMaternKernel (production source kernel):

K(z_i, z_j) = σ² · M_{3/2}(√( Σ_f Δx_f² / ℓ_f² + α · 𝟙[c_i ≠ c_j] ))

One learnable categorical penalty α plus per-feature ARD; the Hamming penalty applies only to the Material Source dim. Closes the +33 psi bLOO architectural gap to legacy_continuous_ard vs the simpler categorical-product topology, while keeping the LOCO advantages of proper categorical handling.
Both JointHammingMaternKernel and JointEmbeddingMaternKernel accept a lengthscale_prior kwarg and register it as a named prior on raw_feat_lengthscale (mirroring GPyTorch's standard MaternKernel(lengthscale_prior=...) contract). _categorical_source_branch passes within_group_prior(...) to the joint kernels so Cement/FlyAsh/Slag binder lengthscales and Fine/Coarse Aggregate lengthscales stay tied to within max/min < 1.05 (the same tying behaviour the prior Hamming production fit had).
DEFAULT_SOURCE_KERNEL = "joint_hamming_matern" (was "hamming").
GATE_TAU = 0.10 (was 0.05). The time gate h(t) = 1 - exp(-t / GATE_TAU) saturates ~3× more slowly, lengthening its monotonic envelope into the pre-1-day extrapolation region. Reduces the fraction of catalog compositions with non-monotonic strength curves from ~33% to ~22% with a slight LOO improvement.

Priors (`boxcrete/priors.py`)

LogNormal ARD prior on non-grouped lengthscales prevents the lengthscale railing observed in the prior production fit (max blind ℓ ≤ 100).
Within-group prior ties Cement/FA/Slag and Fine/Coarse Aggregate lengthscales for binder and aggregate parsimony.

JS port (`docs/gp.mjs` + `docs/gp_v2_fast.mjs`)

Added jointHammingMatern radial-basis function supporting ν ∈ {0.5, 1.5, 2.5} and the Hamming categorical penalty α · 𝟙[c_i ≠ c_j] inside the Matern distance argument.
Updated the kernel(x1, x2, params) dispatcher and the WASM-accelerated batched fast path to switch on params.matern_specific.source_kernel_kind. Both paths add the Hamming penalty before applying the radial basis, so single-point and batched predictors agree to FP-summation order (verified by test_js_predictor_parity.mjs).

UI (`docs/ui.mjs` + `docs/model/compositions.json`)

Material Source panel now renders N buttons (one per class) via a loop over the slider_bounds["Material Source"] range, instead of hardcoded 2-button "Source A / Source B". Class labels (Set 1 / Set 2 / Set 3) come from a MATERIAL_SOURCE_LABELS map; adding a 4th class only requires extending the bounds + the labels map.
compositions.json::slider_bounds["Material Source"] updated from max=1 (stale binary labelling) to max=2 (3-class).
Material Source ingredient-insight panel contains a concise per-class summary (Set 1 mortar / Set 2 Heidelberg Class C / Set 3 Amrize Class F) with a reference to docs/materials_background.md for the full per-class plant / source / HRWR detail.

Production artifacts

Regenerated from the new kernel; all ground-truth-anchored to data/boxcrete_data.csv:

docs/model/strength.json — schema_version=3, joint_hamming_matern, α=0.536, ν=1.5, within-group prior installed on the joint kernel (verified by test_lengthscale_identifiability tying the binder / aggregate lengthscales within max/min < 1.05).
docs/model/gwp.json — 3 classes (previously had 0 and 1 only).
docs/model/compositions.json — correct Material Source labels (61 mortar / 30 Set-2 / 53 Set-3) looked up against the canonical raw data; refreshed strength_predictions, gwp_predictions, pareto_mask.
docs/model/test_vectors.json — 37 vectors.
docs/model/strength_model.pt.
docs/model/mix_analyses.json — migrated 138 LLM-authored mix narratives from the prior Source A/B labels to Set 1 / Set 3.
Notebooks (notebooks/{prediction_and_optimization_tutorial, strength_curve_prediction_demo, slump_prediction_demo}.ipynb): re-executed with the new data; 0 references to the removed MRWR (kg/m3) column remain.

Regeneration pipeline (`experiments/regenerate_*`)

regenerate_strength_json.py — extended introspection to handle the new JointHammingMaternKernel layout (raw_feat_lengthscale, raw_alpha, nu, categorical_mode, feature_dims, source_dim).
regenerate_gwp_json.py (new) — regenerates gwp.json from DEFAULT_GWP_COEFFICIENTS to include all 3 classes.
regenerate_compositions_gwp_predictions.mjs (new) — refreshes compositions.gwp_predictions after any gwp.json change.
fix_compositions_material_source.py (new) — recovery script for any future drift between catalog-stored class labels and the canonical raw data; looks up each catalog composition's class by 7-column composition fingerprint.
regenerate_all_artifacts.sh — pipeline orchestrator updated to include the new GWP step + the new test_catalog_consistency JS test.
docs/generate_mix_analyses.py fallback template — updated from binary labels to 3-class labels.

Tests (production-essential)

test/test_joint_hamming_matern_kernel.py — 14 tests for the new kernel: PSD, Hamming penalty correctness, Matern smoothness, α gradient flow, gauge-fix invariance, etc.
test/test_mix_naming.py — canonical-naming pipeline (38 tests).
test/test_composed_prior.py — within-group prior composition (11 tests).
test/test_kernel_layout.py — updated for the new ProductKernel + joint-kernel layouts. The prior test assumed ScaleKernel(MaternKernel) only; broke when the categorical wrapper was introduced. Restored regression coverage for all registered source-kernel variants.
test/test_lengthscale_identifiability.py — updated to accept the joint-kernel layout (specific lengthscales have length 16 for joint kernels, 17 for the legacy continuous-source layout) and to walk named_priors() for kernels without a standard .lengthscale_prior attribute.

Regression tests for newly-discovered bug-classes

Each test below targets a class of bug that pre-existed silently in the prior test suite and is now caught at CI time:

test/test_catalog_consistency.mjs (new, 8 assertions): cross-validates docs/model/compositions.json against data/boxcrete_data.csv. Anchors to ground truth instead of merely checking internal catalog ↔ predictions consistency. Catches:
- per-row Material Source mislabel,
- missing classes in catalog,
- stale slider_bounds,
- out-of-range Material Source values.
test/test_variance_orientation.py (new, 1 test, ~60 s): asserts σ²(observed_class) ≤ σ²(any other class) for every training row. Direct coverage of the symptom that "switching class in the explorer makes uncertainty contract for a non-observed class" — fires on any future catalog-vs-data mislabelling OR on any kernel mis-specification that produces this counter-intuitive behaviour.
test/test_data_freshness.mjs: replaced the legacy binary comp[7] >= 0.5 ? 1 : 0 Material Source collapse with the same Math.round(comp[gwpParams.class_dim]) lookup the explorer uses. The previous binary collapse silently mapped class 2 to class 1, producing wrong GWP coefficients and a green test on wrong values.
test/test_js_ui_smoke.mjs: same fix to the same legacy binary collapse (also silently mapped class 2 → class 1).

Test status

Python: ~300 PASS / 0 fail. Covers kernel + identifiability + mix naming + composed prior + layout + curve monotonicity + pretrained-loader fidelity + variance orientation + the rest of the inheritance suite from the previous PR.
JS: 10 / 10 PASS:
- GP parity (296 assertions),
- predictor parity (8),
- physical constraints (37),
- UI smoke (144 × 64),
- catalog consistency (8),
- data freshness (6),
- curve monotonicity (144),
- feature parity (49),
- strength architecture,
- units.
Lint: black --check clean; flake8 critical errors (E9, F63, F7, F82) = 0.

Follow-up

A separate research-detail commit (lands as a sibling) carries the non-essential experimental artefacts:

experiments/THREE_CLASS_AND_PRIOR_BENCHMARK.md — full ablation writeup with literature context, including the ablation series that motivated GATE_TAU=0.10, the time-only kernel choice, and the deployed-baseline-vs-new comparison.
experiments/three_class_ablation.py + experiments/model_variant_study.py — ablation infrastructure.
Tests for the registered research kernel variants (test_joint_embedding_matern_kernel.py, test_rbf_embedding_kernel.py).
Mobile-performance investigation doc + PR-prep meta-doc.
Legacy ablation infrastructure restored from the prior PR baseline.

Migrates the strength GP from the prior 2-class (pooled mortar+Set-2, contaminated Set-3) dataset and `hamming` categorical source kernel to a corrected 3-class dataset and a new joint-distance source kernel `joint_hamming_matern` that combines per-feature ARD with a single learnable Hamming penalty inside one Matérn-3/2 radial basis. ## Production rationale (Full ablation deferred to a research-detail follow-up commit.) - The **previously-deployed model** treated Material Source as a continuous numeric coordinate — mathematically mis-specified for a 3-class categorical, but it preserved the joint kernel topology that factorised categorical-product kernels lose. The new `joint_hamming_matern` recovers the joint topology while keeping correct categorical handling. - **Strictly dominates the previous `hamming` production default** on every in-distribution metric: | metric | prior | new | |--- |---: |---: | | LOO RMSE | 510 | **495** psi | | bLOO RMSE | 738 | **717** psi | | bLOO PIT-KS | 0.039 | **0.030** | while passing all pre-registered LOCO acceptance criteria (every held-out class within 110% of Hamming's LOCO RMSE). - **Seed-deterministic across all metrics** — matches the prior production model's reproducibility and beats IndexKernel's seed sensitivity. ## Changes ### Data - `data/boxcrete_data.csv`: three-class merge with canonical mix-name table (`boxcrete/_mix_naming_table.csv`, `boxcrete/mix_naming.py`). Splits 12 corruption-collision mix names into 24 canonical names per actual recipe; corrects mortar/concrete misroutes; drops 3 clay-using mortars (`M75`/`M76`/`M77`) the GP feature set cannot represent. Strength values and `Strength (Std)` for the 638 common `(composition + temp + time)` tuples are byte-identical to the prior dataset; only labels and partition differ. - `test/fixtures/boxcrete_data_pre_v5.csv`: legacy fixture preserved under its established filename for regression / no-regression checks against the previous data. - `boxcrete/utils.py`: `Material Source` bounds bumped to `(0, 2)`; added per-class GWP coefficients for Class 2 (Set-3 Amrize cement / Class F fly ash concrete) — derived by `scripts/derive_class_2_gwp.py` from the new rows. ### Kernel framework (`boxcrete/kernels.py`) - Replaced the source-as-continuous-numeric kernel with a categorical-kernel switch `_categorical_source_branch`. Production variant: `joint_hamming_matern`. The codebase also registers research variants (`rbf_embedding_d{1,2,3}`, `joint_chain_matern`, `joint_hamming_matern_nu{05,25}`, `joint_embedding_matern_d{1,2,3}`, `additive_*`) as alternative ablation points; their detailed evaluation lives in the follow-up research commit. - New `JointHammingMaternKernel` (production source kernel): ``` K(z_i, z_j) = σ² · M_{3/2}(√( Σ_f Δx_f² / ℓ_f² + α · 𝟙[c_i ≠ c_j] )) ``` One learnable categorical penalty `α` plus per-feature ARD; the Hamming penalty applies **only** to the `Material Source` dim. Closes the +33 psi bLOO architectural gap to `legacy_continuous_ard` vs the simpler categorical-product topology, while keeping the LOCO advantages of proper categorical handling. - Both `JointHammingMaternKernel` and `JointEmbeddingMaternKernel` accept a `lengthscale_prior` kwarg and register it as a named prior on `raw_feat_lengthscale` (mirroring GPyTorch's standard `MaternKernel(lengthscale_prior=...)` contract). `_categorical_source_branch` passes `within_group_prior(...)` to the joint kernels so Cement/FlyAsh/Slag binder lengthscales and Fine/Coarse Aggregate lengthscales stay tied to within `max/min < 1.05` (the same tying behaviour the prior Hamming production fit had). - `DEFAULT_SOURCE_KERNEL = "joint_hamming_matern"` (was `"hamming"`). - `GATE_TAU = 0.10` (was `0.05`). The time gate `h(t) = 1 - exp(-t / GATE_TAU)` saturates ~3× more slowly, lengthening its monotonic envelope into the pre-1-day extrapolation region. Reduces the fraction of catalog compositions with non-monotonic strength curves from ~33% to ~22% with a slight LOO improvement. ### Priors (`boxcrete/priors.py`) - LogNormal ARD prior on non-grouped lengthscales prevents the lengthscale railing observed in the prior production fit (`max blind ℓ ≤ 100`). - Within-group prior ties Cement/FA/Slag and Fine/Coarse Aggregate lengthscales for binder and aggregate parsimony. ### JS port (`docs/gp.mjs` + `docs/gp_v2_fast.mjs`) - Added `jointHammingMatern` radial-basis function supporting `ν ∈ {0.5, 1.5, 2.5}` and the Hamming categorical penalty `α · 𝟙[c_i ≠ c_j]` inside the Matern distance argument. - Updated the `kernel(x1, x2, params)` dispatcher and the WASM-accelerated batched fast path to switch on `params.matern_specific.source_kernel_kind`. Both paths add the Hamming penalty before applying the radial basis, so single-point and batched predictors agree to FP-summation order (verified by `test_js_predictor_parity.mjs`). ### UI (`docs/ui.mjs` + `docs/model/compositions.json`) - `Material Source` panel now renders **N buttons (one per class)** via a loop over the `slider_bounds["Material Source"]` range, instead of hardcoded 2-button "Source A / Source B". Class labels (`Set 1` / `Set 2` / `Set 3`) come from a `MATERIAL_SOURCE_LABELS` map; adding a 4th class only requires extending the bounds + the labels map. - `compositions.json::slider_bounds["Material Source"]` updated from `max=1` (stale binary labelling) to `max=2` (3-class). - `Material Source` ingredient-insight panel contains a concise per-class summary (Set 1 mortar / Set 2 Heidelberg Class C / Set 3 Amrize Class F) with a reference to `docs/materials_background.md` for the full per-class plant / source / HRWR detail. ### Production artifacts Regenerated from the new kernel; all ground-truth-anchored to `data/boxcrete_data.csv`: - `docs/model/strength.json` — `schema_version=3`, `joint_hamming_matern`, `α=0.536`, `ν=1.5`, within-group prior installed on the joint kernel (verified by `test_lengthscale_identifiability` tying the binder / aggregate lengthscales within `max/min < 1.05`). - `docs/model/gwp.json` — 3 classes (previously had 0 and 1 only). - `docs/model/compositions.json` — correct `Material Source` labels (61 mortar / 30 Set-2 / 53 Set-3) looked up against the canonical raw data; refreshed `strength_predictions`, `gwp_predictions`, `pareto_mask`. - `docs/model/test_vectors.json` — 37 vectors. - `docs/model/strength_model.pt`. - `docs/model/mix_analyses.json` — migrated 138 LLM-authored mix narratives from the prior `Source A/B` labels to `Set 1` / `Set 3`. - Notebooks (`notebooks/{prediction_and_optimization_tutorial, strength_curve_prediction_demo, slump_prediction_demo}.ipynb`): re-executed with the new data; 0 references to the removed `MRWR (kg/m3)` column remain. ### Regeneration pipeline (`experiments/regenerate_*`) - `regenerate_strength_json.py` — extended introspection to handle the new `JointHammingMaternKernel` layout (`raw_feat_lengthscale`, `raw_alpha`, `nu`, `categorical_mode`, `feature_dims`, `source_dim`). - `regenerate_gwp_json.py` (new) — regenerates `gwp.json` from `DEFAULT_GWP_COEFFICIENTS` to include all 3 classes. - `regenerate_compositions_gwp_predictions.mjs` (new) — refreshes `compositions.gwp_predictions` after any `gwp.json` change. - `fix_compositions_material_source.py` (new) — recovery script for any future drift between catalog-stored class labels and the canonical raw data; looks up each catalog composition's class by 7-column composition fingerprint. - `regenerate_all_artifacts.sh` — pipeline orchestrator updated to include the new GWP step + the new `test_catalog_consistency` JS test. - `docs/generate_mix_analyses.py` fallback template — updated from binary labels to 3-class labels. ### Tests (production-essential) - `test/test_joint_hamming_matern_kernel.py` — 14 tests for the new kernel: PSD, Hamming penalty correctness, Matern smoothness, `α` gradient flow, gauge-fix invariance, etc. - `test/test_mix_naming.py` — canonical-naming pipeline (38 tests). - `test/test_composed_prior.py` — within-group prior composition (11 tests). - `test/test_kernel_layout.py` — updated for the new ProductKernel + joint-kernel layouts. The prior test assumed `ScaleKernel(MaternKernel)` only; broke when the categorical wrapper was introduced. Restored regression coverage for all registered source-kernel variants. - `test/test_lengthscale_identifiability.py` — updated to accept the joint-kernel layout (specific lengthscales have length 16 for joint kernels, 17 for the legacy continuous-source layout) and to walk `named_priors()` for kernels without a standard `.lengthscale_prior` attribute. ### Regression tests for newly-discovered bug-classes Each test below targets a class of bug that pre-existed silently in the prior test suite and is now caught at CI time: 1. `test/test_catalog_consistency.mjs` (new, 8 assertions): cross-validates `docs/model/compositions.json` against `data/boxcrete_data.csv`. Anchors to ground truth instead of merely checking internal `catalog ↔ predictions` consistency. Catches: - per-row `Material Source` mislabel, - missing classes in catalog, - stale `slider_bounds`, - out-of-range `Material Source` values. 2. `test/test_variance_orientation.py` (new, 1 test, ~60 s): asserts `σ²(observed_class) ≤ σ²(any other class)` for every training row. Direct coverage of the symptom that "switching class in the explorer makes uncertainty contract for a non-observed class" — fires on any future catalog-vs-data mislabelling **OR** on any kernel mis-specification that produces this counter-intuitive behaviour. 3. `test/test_data_freshness.mjs`: replaced the legacy binary `comp[7] >= 0.5 ? 1 : 0` `Material Source` collapse with the same `Math.round(comp[gwpParams.class_dim])` lookup the explorer uses. The previous binary collapse silently mapped class 2 to class 1, producing wrong GWP coefficients and a green test on wrong values. 4. `test/test_js_ui_smoke.mjs`: same fix to the same legacy binary collapse (also silently mapped class 2 → class 1). ## Test status - **Python**: ~300 PASS / 0 fail. Covers kernel + identifiability + mix naming + composed prior + layout + curve monotonicity + pretrained-loader fidelity + variance orientation + the rest of the inheritance suite from the previous PR. - **JS**: 10 / 10 PASS: - GP parity (296 assertions), - predictor parity (8), - physical constraints (37), - UI smoke (144 × 64), - catalog consistency (8), - data freshness (6), - curve monotonicity (144), - feature parity (49), - strength architecture, - units. - **Lint**: `black --check` clean; flake8 critical errors (`E9, F63, F7, F82`) = 0. ## Follow-up A separate research-detail commit (lands as a sibling) carries the non-essential experimental artefacts: - `experiments/THREE_CLASS_AND_PRIOR_BENCHMARK.md` — full ablation writeup with literature context, including the ablation series that motivated `GATE_TAU=0.10`, the time-only kernel choice, and the deployed-baseline-vs-new comparison. - `experiments/three_class_ablation.py` + `experiments/model_variant_study.py` — ablation infrastructure. - Tests for the registered research kernel variants (`test_joint_embedding_matern_kernel.py`, `test_rbf_embedding_kernel.py`). - Mobile-performance investigation doc + PR-prep meta-doc. - Legacy ablation infrastructure restored from the prior PR baseline.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026

SebastianAment force-pushed the joint-hamming-matern branch 2 times, most recently from f12e394 to 5b93c20 Compare June 17, 2026 21:18

SebastianAment force-pushed the joint-hamming-matern branch from 5b93c20 to 0a9b887 Compare June 17, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3-class strength GP + joint_hamming_matern production kernel#29

3-class strength GP + joint_hamming_matern production kernel#29
SebastianAment wants to merge 1 commit into
mainfrom
joint-hamming-matern

SebastianAment commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

metric	prior	new
LOO RMSE	510	495 psi
bLOO RMSE	738	717 psi
bLOO PIT-KS	0.039	0.030

Conversation

SebastianAment commented Jun 17, 2026

Production rationale

Changes

Data

Kernel framework (boxcrete/kernels.py)

Priors (boxcrete/priors.py)

JS port (docs/gp.mjs + docs/gp_v2_fast.mjs)

UI (docs/ui.mjs + docs/model/compositions.json)

Production artifacts

Regeneration pipeline (experiments/regenerate_*)

Tests (production-essential)

Regression tests for newly-discovered bug-classes

Test status

Follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kernel framework (`boxcrete/kernels.py`)

Priors (`boxcrete/priors.py`)

JS port (`docs/gp.mjs` + `docs/gp_v2_fast.mjs`)

UI (`docs/ui.mjs` + `docs/model/compositions.json`)

Regeneration pipeline (`experiments/regenerate_*`)