3-class strength GP + joint_hamming_matern production kernel#29
Open
SebastianAment wants to merge 1 commit into
Open
3-class strength GP + joint_hamming_matern production kernel#29SebastianAment wants to merge 1 commit into
SebastianAment wants to merge 1 commit into
Conversation
f12e394 to
5b93c20
Compare
Migrates the strength GP from the prior 2-class (pooled mortar+Set-2,
contaminated Set-3) dataset and `hamming` categorical source kernel to
a corrected 3-class dataset and a new joint-distance source kernel
`joint_hamming_matern` that combines per-feature ARD with a single
learnable Hamming penalty inside one Matérn-3/2 radial basis.
## Production rationale
(Full ablation deferred to a research-detail follow-up commit.)
- The **previously-deployed model** treated Material Source as a
continuous numeric coordinate — mathematically mis-specified for a
3-class categorical, but it preserved the joint kernel topology
that factorised categorical-product kernels lose. The new
`joint_hamming_matern` recovers the joint topology while keeping
correct categorical handling.
- **Strictly dominates the previous `hamming` production default**
on every in-distribution metric:
| metric | prior | new |
|--- |---: |---: |
| LOO RMSE | 510 | **495** psi |
| bLOO RMSE | 738 | **717** psi |
| bLOO PIT-KS | 0.039 | **0.030** |
while passing all pre-registered LOCO acceptance criteria (every
held-out class within 110% of Hamming's LOCO RMSE).
- **Seed-deterministic across all metrics** — matches the prior
production model's reproducibility and beats IndexKernel's seed
sensitivity.
## Changes
### Data
- `data/boxcrete_data.csv`: three-class merge with canonical mix-name
table (`boxcrete/_mix_naming_table.csv`, `boxcrete/mix_naming.py`).
Splits 12 corruption-collision mix names into 24 canonical names per
actual recipe; corrects mortar/concrete misroutes; drops 3
clay-using mortars (`M75`/`M76`/`M77`) the GP feature set cannot
represent. Strength values and `Strength (Std)` for the 638 common
`(composition + temp + time)` tuples are byte-identical to the
prior dataset; only labels and partition differ.
- `test/fixtures/boxcrete_data_pre_v5.csv`: legacy fixture preserved
under its established filename for regression / no-regression
checks against the previous data.
- `boxcrete/utils.py`: `Material Source` bounds bumped to `(0, 2)`;
added per-class GWP coefficients for Class 2 (Set-3 Amrize cement /
Class F fly ash concrete) — derived by
`scripts/derive_class_2_gwp.py` from the new rows.
### Kernel framework (`boxcrete/kernels.py`)
- Replaced the source-as-continuous-numeric kernel with a
categorical-kernel switch `_categorical_source_branch`. Production
variant: `joint_hamming_matern`. The codebase also registers
research variants (`rbf_embedding_d{1,2,3}`,
`joint_chain_matern`, `joint_hamming_matern_nu{05,25}`,
`joint_embedding_matern_d{1,2,3}`, `additive_*`) as alternative
ablation points; their detailed evaluation lives in the follow-up
research commit.
- New `JointHammingMaternKernel` (production source kernel):
```
K(z_i, z_j) = σ² · M_{3/2}(√( Σ_f Δx_f² / ℓ_f² + α · 𝟙[c_i ≠ c_j] ))
```
One learnable categorical penalty `α` plus per-feature ARD; the
Hamming penalty applies **only** to the `Material Source` dim.
Closes the +33 psi bLOO architectural gap to
`legacy_continuous_ard` vs the simpler categorical-product
topology, while keeping the LOCO advantages of proper categorical
handling.
- Both `JointHammingMaternKernel` and `JointEmbeddingMaternKernel`
accept a `lengthscale_prior` kwarg and register it as a named
prior on `raw_feat_lengthscale` (mirroring GPyTorch's standard
`MaternKernel(lengthscale_prior=...)` contract).
`_categorical_source_branch` passes `within_group_prior(...)` to
the joint kernels so Cement/FlyAsh/Slag binder lengthscales and
Fine/Coarse Aggregate lengthscales stay tied to within `max/min <
1.05` (the same tying behaviour the prior Hamming production fit
had).
- `DEFAULT_SOURCE_KERNEL = "joint_hamming_matern"` (was `"hamming"`).
- `GATE_TAU = 0.10` (was `0.05`). The time gate
`h(t) = 1 - exp(-t / GATE_TAU)` saturates ~3× more slowly,
lengthening its monotonic envelope into the pre-1-day
extrapolation region. Reduces the fraction of catalog
compositions with non-monotonic strength curves from ~33% to ~22%
with a slight LOO improvement.
### Priors (`boxcrete/priors.py`)
- LogNormal ARD prior on non-grouped lengthscales prevents the
lengthscale railing observed in the prior production fit
(`max blind ℓ ≤ 100`).
- Within-group prior ties Cement/FA/Slag and Fine/Coarse Aggregate
lengthscales for binder and aggregate parsimony.
### JS port (`docs/gp.mjs` + `docs/gp_v2_fast.mjs`)
- Added `jointHammingMatern` radial-basis function supporting
`ν ∈ {0.5, 1.5, 2.5}` and the Hamming categorical penalty
`α · 𝟙[c_i ≠ c_j]` inside the Matern distance argument.
- Updated the `kernel(x1, x2, params)` dispatcher and the
WASM-accelerated batched fast path to switch on
`params.matern_specific.source_kernel_kind`. Both paths add the
Hamming penalty before applying the radial basis, so single-point
and batched predictors agree to FP-summation order (verified by
`test_js_predictor_parity.mjs`).
### UI (`docs/ui.mjs` + `docs/model/compositions.json`)
- `Material Source` panel now renders **N buttons (one per class)**
via a loop over the `slider_bounds["Material Source"]` range,
instead of hardcoded 2-button "Source A / Source B". Class
labels (`Set 1` / `Set 2` / `Set 3`) come from a
`MATERIAL_SOURCE_LABELS` map; adding a 4th class only requires
extending the bounds + the labels map.
- `compositions.json::slider_bounds["Material Source"]` updated from
`max=1` (stale binary labelling) to `max=2` (3-class).
- `Material Source` ingredient-insight panel contains a concise
per-class summary (Set 1 mortar / Set 2 Heidelberg Class C / Set 3
Amrize Class F) with a reference to `docs/materials_background.md`
for the full per-class plant / source / HRWR detail.
### Production artifacts
Regenerated from the new kernel; all ground-truth-anchored to
`data/boxcrete_data.csv`:
- `docs/model/strength.json` — `schema_version=3`,
`joint_hamming_matern`, `α=0.536`, `ν=1.5`, within-group prior
installed on the joint kernel (verified by
`test_lengthscale_identifiability` tying the binder / aggregate
lengthscales within `max/min < 1.05`).
- `docs/model/gwp.json` — 3 classes (previously had 0 and 1 only).
- `docs/model/compositions.json` — correct `Material Source` labels
(61 mortar / 30 Set-2 / 53 Set-3) looked up against the canonical
raw data; refreshed `strength_predictions`, `gwp_predictions`,
`pareto_mask`.
- `docs/model/test_vectors.json` — 37 vectors.
- `docs/model/strength_model.pt`.
- `docs/model/mix_analyses.json` — migrated 138 LLM-authored mix
narratives from the prior `Source A/B` labels to `Set 1` / `Set 3`.
- Notebooks (`notebooks/{prediction_and_optimization_tutorial,
strength_curve_prediction_demo, slump_prediction_demo}.ipynb`):
re-executed with the new data; 0 references to the removed
`MRWR (kg/m3)` column remain.
### Regeneration pipeline (`experiments/regenerate_*`)
- `regenerate_strength_json.py` — extended introspection to handle
the new `JointHammingMaternKernel` layout (`raw_feat_lengthscale`,
`raw_alpha`, `nu`, `categorical_mode`, `feature_dims`,
`source_dim`).
- `regenerate_gwp_json.py` (new) — regenerates `gwp.json` from
`DEFAULT_GWP_COEFFICIENTS` to include all 3 classes.
- `regenerate_compositions_gwp_predictions.mjs` (new) — refreshes
`compositions.gwp_predictions` after any `gwp.json` change.
- `fix_compositions_material_source.py` (new) — recovery script for
any future drift between catalog-stored class labels and the
canonical raw data; looks up each catalog composition's class by
7-column composition fingerprint.
- `regenerate_all_artifacts.sh` — pipeline orchestrator updated to
include the new GWP step + the new `test_catalog_consistency` JS
test.
- `docs/generate_mix_analyses.py` fallback template — updated from
binary labels to 3-class labels.
### Tests (production-essential)
- `test/test_joint_hamming_matern_kernel.py` — 14 tests for the new
kernel: PSD, Hamming penalty correctness, Matern smoothness,
`α` gradient flow, gauge-fix invariance, etc.
- `test/test_mix_naming.py` — canonical-naming pipeline (38 tests).
- `test/test_composed_prior.py` — within-group prior composition
(11 tests).
- `test/test_kernel_layout.py` — updated for the new ProductKernel +
joint-kernel layouts. The prior test assumed
`ScaleKernel(MaternKernel)` only; broke when the categorical
wrapper was introduced. Restored regression coverage for all
registered source-kernel variants.
- `test/test_lengthscale_identifiability.py` — updated to accept the
joint-kernel layout (specific lengthscales have length 16 for
joint kernels, 17 for the legacy continuous-source layout) and
to walk `named_priors()` for kernels without a standard
`.lengthscale_prior` attribute.
### Regression tests for newly-discovered bug-classes
Each test below targets a class of bug that pre-existed silently
in the prior test suite and is now caught at CI time:
1. `test/test_catalog_consistency.mjs` (new, 8 assertions):
cross-validates `docs/model/compositions.json` against
`data/boxcrete_data.csv`. Anchors to ground truth instead of
merely checking internal `catalog ↔ predictions` consistency.
Catches:
- per-row `Material Source` mislabel,
- missing classes in catalog,
- stale `slider_bounds`,
- out-of-range `Material Source` values.
2. `test/test_variance_orientation.py` (new, 1 test, ~60 s):
asserts `σ²(observed_class) ≤ σ²(any other class)` for every
training row. Direct coverage of the symptom that "switching
class in the explorer makes uncertainty contract for a
non-observed class" — fires on any future catalog-vs-data
mislabelling **OR** on any kernel mis-specification that
produces this counter-intuitive behaviour.
3. `test/test_data_freshness.mjs`: replaced the legacy binary
`comp[7] >= 0.5 ? 1 : 0` `Material Source` collapse with the
same `Math.round(comp[gwpParams.class_dim])` lookup the
explorer uses. The previous binary collapse silently mapped
class 2 to class 1, producing wrong GWP coefficients and a
green test on wrong values.
4. `test/test_js_ui_smoke.mjs`: same fix to the same legacy binary
collapse (also silently mapped class 2 → class 1).
## Test status
- **Python**: ~300 PASS / 0 fail. Covers kernel + identifiability +
mix naming + composed prior + layout + curve monotonicity +
pretrained-loader fidelity + variance orientation + the rest of
the inheritance suite from the previous PR.
- **JS**: 10 / 10 PASS:
- GP parity (296 assertions),
- predictor parity (8),
- physical constraints (37),
- UI smoke (144 × 64),
- catalog consistency (8),
- data freshness (6),
- curve monotonicity (144),
- feature parity (49),
- strength architecture,
- units.
- **Lint**: `black --check` clean; flake8 critical errors
(`E9, F63, F7, F82`) = 0.
## Follow-up
A separate research-detail commit (lands as a sibling) carries the
non-essential experimental artefacts:
- `experiments/THREE_CLASS_AND_PRIOR_BENCHMARK.md` — full ablation
writeup with literature context, including the ablation series
that motivated `GATE_TAU=0.10`, the time-only kernel choice, and
the deployed-baseline-vs-new comparison.
- `experiments/three_class_ablation.py` +
`experiments/model_variant_study.py` — ablation infrastructure.
- Tests for the registered research kernel variants
(`test_joint_embedding_matern_kernel.py`,
`test_rbf_embedding_kernel.py`).
- Mobile-performance investigation doc + PR-prep meta-doc.
- Legacy ablation infrastructure restored from the prior PR
baseline.
5b93c20 to
0a9b887
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Migrates the strength GP from the prior 2-class (pooled mortar+Set-2, contaminated Set-3) dataset and
hammingcategorical source kernel to a corrected 3-class dataset and a new joint-distance source kerneljoint_hamming_maternthat combines per-feature ARD with a single learnable Hamming penalty inside one Matérn-3/2 radial basis.Production rationale
(Full ablation deferred to a research-detail follow-up commit.)
The previously-deployed model treated Material Source as a continuous numeric coordinate — mathematically mis-specified for a 3-class categorical, but it preserved the joint kernel topology that factorised categorical-product kernels lose. The new
joint_hamming_maternrecovers the joint topology while keeping correct categorical handling.Strictly dominates the previous
hammingproduction default on every in-distribution metric:while passing all pre-registered LOCO acceptance criteria (every held-out class within 110% of Hamming's LOCO RMSE).
Seed-deterministic across all metrics — matches the prior production model's reproducibility and beats IndexKernel's seed sensitivity.
Changes
Data
data/boxcrete_data.csv: three-class merge with canonical mix-name table (boxcrete/_mix_naming_table.csv,boxcrete/mix_naming.py). Splits 12 corruption-collision mix names into 24 canonical names per actual recipe; corrects mortar/concrete misroutes; drops 3 clay-using mortars (M75/M76/M77) the GP feature set cannot represent. Strength values andStrength (Std)for the 638 common(composition + temp + time)tuples are byte-identical to the prior dataset; only labels and partition differ.test/fixtures/boxcrete_data_pre_v5.csv: legacy fixture preserved under its established filename for regression / no-regression checks against the previous data.boxcrete/utils.py:Material Sourcebounds bumped to(0, 2); added per-class GWP coefficients for Class 2 (Set-3 Amrize cement / Class F fly ash concrete) — derived byscripts/derive_class_2_gwp.pyfrom the new rows.Kernel framework (
boxcrete/kernels.py)Replaced the source-as-continuous-numeric kernel with a categorical-kernel switch
_categorical_source_branch. Production variant:joint_hamming_matern. The codebase also registers research variants (rbf_embedding_d{1,2,3},joint_chain_matern,joint_hamming_matern_nu{05,25},joint_embedding_matern_d{1,2,3},additive_*) as alternative ablation points; their detailed evaluation lives in the follow-up research commit.New
JointHammingMaternKernel(production source kernel):K(z_i, z_j) = σ² · M_{3/2}(√( Σ_f Δx_f² / ℓ_f² + α · 𝟙[c_i ≠ c_j] ))One learnable categorical penalty
αplus per-feature ARD; the Hamming penalty applies only to theMaterial Sourcedim. Closes the +33 psi bLOO architectural gap tolegacy_continuous_ardvs the simpler categorical-product topology, while keeping the LOCO advantages of proper categorical handling.Both
JointHammingMaternKernelandJointEmbeddingMaternKernelaccept alengthscale_priorkwarg and register it as a named prior onraw_feat_lengthscale(mirroring GPyTorch's standardMaternKernel(lengthscale_prior=...)contract)._categorical_source_branchpasseswithin_group_prior(...)to the joint kernels so Cement/FlyAsh/Slag binder lengthscales and Fine/Coarse Aggregate lengthscales stay tied to withinmax/min < 1.05(the same tying behaviour the prior Hamming production fit had).DEFAULT_SOURCE_KERNEL = "joint_hamming_matern"(was"hamming").GATE_TAU = 0.10(was0.05). The time gateh(t) = 1 - exp(-t / GATE_TAU)saturates ~3× more slowly, lengthening its monotonic envelope into the pre-1-day extrapolation region. Reduces the fraction of catalog compositions with non-monotonic strength curves from ~33% to ~22% with a slight LOO improvement.Priors (
boxcrete/priors.py)max blind ℓ ≤ 100).JS port (
docs/gp.mjs+docs/gp_v2_fast.mjs)jointHammingMaternradial-basis function supportingν ∈ {0.5, 1.5, 2.5}and the Hamming categorical penaltyα · 𝟙[c_i ≠ c_j]inside the Matern distance argument.kernel(x1, x2, params)dispatcher and the WASM-accelerated batched fast path to switch onparams.matern_specific.source_kernel_kind. Both paths add the Hamming penalty before applying the radial basis, so single-point and batched predictors agree to FP-summation order (verified bytest_js_predictor_parity.mjs).UI (
docs/ui.mjs+docs/model/compositions.json)Material Sourcepanel now renders N buttons (one per class) via a loop over theslider_bounds["Material Source"]range, instead of hardcoded 2-button "Source A / Source B". Class labels (Set 1/Set 2/Set 3) come from aMATERIAL_SOURCE_LABELSmap; adding a 4th class only requires extending the bounds + the labels map.compositions.json::slider_bounds["Material Source"]updated frommax=1(stale binary labelling) tomax=2(3-class).Material Sourceingredient-insight panel contains a concise per-class summary (Set 1 mortar / Set 2 Heidelberg Class C / Set 3 Amrize Class F) with a reference todocs/materials_background.mdfor the full per-class plant / source / HRWR detail.Production artifacts
Regenerated from the new kernel; all ground-truth-anchored to
data/boxcrete_data.csv:docs/model/strength.json—schema_version=3,joint_hamming_matern,α=0.536,ν=1.5, within-group prior installed on the joint kernel (verified bytest_lengthscale_identifiabilitytying the binder / aggregate lengthscales withinmax/min < 1.05).docs/model/gwp.json— 3 classes (previously had 0 and 1 only).docs/model/compositions.json— correctMaterial Sourcelabels (61 mortar / 30 Set-2 / 53 Set-3) looked up against the canonical raw data; refreshedstrength_predictions,gwp_predictions,pareto_mask.docs/model/test_vectors.json— 37 vectors.docs/model/strength_model.pt.docs/model/mix_analyses.json— migrated 138 LLM-authored mix narratives from the priorSource A/Blabels toSet 1/Set 3.notebooks/{prediction_and_optimization_tutorial, strength_curve_prediction_demo, slump_prediction_demo}.ipynb): re-executed with the new data; 0 references to the removedMRWR (kg/m3)column remain.Regeneration pipeline (
experiments/regenerate_*)regenerate_strength_json.py— extended introspection to handle the newJointHammingMaternKernellayout (raw_feat_lengthscale,raw_alpha,nu,categorical_mode,feature_dims,source_dim).regenerate_gwp_json.py(new) — regeneratesgwp.jsonfromDEFAULT_GWP_COEFFICIENTSto include all 3 classes.regenerate_compositions_gwp_predictions.mjs(new) — refreshescompositions.gwp_predictionsafter anygwp.jsonchange.fix_compositions_material_source.py(new) — recovery script for any future drift between catalog-stored class labels and the canonical raw data; looks up each catalog composition's class by 7-column composition fingerprint.regenerate_all_artifacts.sh— pipeline orchestrator updated to include the new GWP step + the newtest_catalog_consistencyJS test.docs/generate_mix_analyses.pyfallback template — updated from binary labels to 3-class labels.Tests (production-essential)
test/test_joint_hamming_matern_kernel.py— 14 tests for the new kernel: PSD, Hamming penalty correctness, Matern smoothness,αgradient flow, gauge-fix invariance, etc.test/test_mix_naming.py— canonical-naming pipeline (38 tests).test/test_composed_prior.py— within-group prior composition (11 tests).test/test_kernel_layout.py— updated for the new ProductKernel + joint-kernel layouts. The prior test assumedScaleKernel(MaternKernel)only; broke when the categorical wrapper was introduced. Restored regression coverage for all registered source-kernel variants.test/test_lengthscale_identifiability.py— updated to accept the joint-kernel layout (specific lengthscales have length 16 for joint kernels, 17 for the legacy continuous-source layout) and to walknamed_priors()for kernels without a standard.lengthscale_priorattribute.Regression tests for newly-discovered bug-classes
Each test below targets a class of bug that pre-existed silently in the prior test suite and is now caught at CI time:
test/test_catalog_consistency.mjs(new, 8 assertions): cross-validatesdocs/model/compositions.jsonagainstdata/boxcrete_data.csv. Anchors to ground truth instead of merely checking internalcatalog ↔ predictionsconsistency. Catches:Material Sourcemislabel,slider_bounds,Material Sourcevalues.test/test_variance_orientation.py(new, 1 test, ~60 s): assertsσ²(observed_class) ≤ σ²(any other class)for every training row. Direct coverage of the symptom that "switching class in the explorer makes uncertainty contract for a non-observed class" — fires on any future catalog-vs-data mislabelling OR on any kernel mis-specification that produces this counter-intuitive behaviour.test/test_data_freshness.mjs: replaced the legacy binarycomp[7] >= 0.5 ? 1 : 0Material Sourcecollapse with the sameMath.round(comp[gwpParams.class_dim])lookup the explorer uses. The previous binary collapse silently mapped class 2 to class 1, producing wrong GWP coefficients and a green test on wrong values.test/test_js_ui_smoke.mjs: same fix to the same legacy binary collapse (also silently mapped class 2 → class 1).Test status
black --checkclean; flake8 critical errors (E9, F63, F7, F82) = 0.Follow-up
A separate research-detail commit (lands as a sibling) carries the non-essential experimental artefacts:
experiments/THREE_CLASS_AND_PRIOR_BENCHMARK.md— full ablation writeup with literature context, including the ablation series that motivatedGATE_TAU=0.10, the time-only kernel choice, and the deployed-baseline-vs-new comparison.experiments/three_class_ablation.py+experiments/model_variant_study.py— ablation infrastructure.test_joint_embedding_matern_kernel.py,test_rbf_embedding_kernel.py).