Skip to content

Commit 0561024

Browse files
authored
Merge branch 'master' into perf/skip-identity-zoom-replay
2 parents 1c344a9 + dc7dadd commit 0561024

69 files changed

Lines changed: 55338 additions & 31942 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/cache-refresh.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ jobs:
1010
permissions:
1111
contents: read
1212
steps:
13-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
14-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
13+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
14+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
1515
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
1616
with:
1717
node-version: '24'

.github/workflows/claude.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,13 @@ jobs:
6161
echo "Detected mode: $MODE (trigger=$TRIGGER)"
6262
6363
- name: Checkout repository
64-
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
64+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
6565
with:
6666
fetch-depth: 0
6767
token: ${{ secrets.PAT }}
6868

6969
- name: Setup pnpm
70-
uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
70+
uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
7171

7272
- name: Setup Node.js
7373
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
@@ -278,7 +278,7 @@ jobs:
278278
- name: Run Claude Code
279279
id: claude
280280
if: ${{ always() }}
281-
uses: anthropics/claude-code-action@fefa07e9c665b7320f08c3b525980457f22f58aa # v1.0.111
281+
uses: anthropics/claude-code-action@fbda2eb1bdc90d319b8d853f5deb53bca199a7c1 # v1.0.140
282282
env:
283283
GH_TOKEN: ${{ secrets.PAT }}
284284
GITHUB_TOKEN: ${{ secrets.PAT }}
@@ -326,12 +326,12 @@ jobs:
326326
cancel-in-progress: false
327327
steps:
328328
- name: Checkout repository
329-
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
329+
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
330330
with:
331331
fetch-depth: 0
332332

333333
- name: PR Review with Claude
334-
uses: anthropics/claude-code-action@fefa07e9c665b7320f08c3b525980457f22f58aa # v1.0.111
334+
uses: anthropics/claude-code-action@fbda2eb1bdc90d319b8d853f5deb53bca199a7c1 # v1.0.140
335335
with:
336336
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
337337
trigger_phrase: '@claude review'

.github/workflows/db-backup.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ jobs:
1212
permissions:
1313
contents: write
1414
steps:
15-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
16-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
15+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
16+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
1717
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
1818
with:
1919
node-version: '24'

.github/workflows/ingest-results.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ jobs:
1414
- name: Wait for source run to finish
1515
run: sleep 300
1616

17-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
18-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
17+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
18+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
1919
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
2020
with:
2121
node-version: '24'

.github/workflows/lint.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ jobs:
1313
permissions:
1414
contents: read
1515
steps:
16-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
17-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
16+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
17+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
1818
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
1919
with:
2020
node-version: '24'

.github/workflows/tests-e2e.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ jobs:
2222
permissions:
2323
contents: read
2424
steps:
25-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
26-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
25+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
26+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
2727
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
2828
with:
2929
node-version: '24'
@@ -60,13 +60,13 @@ jobs:
6060
browser: [chrome, firefox]
6161
shard: [1, 2]
6262
steps:
63-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
64-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
63+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
64+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
6565
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
6666
with:
6767
node-version: '24'
6868
cache: pnpm
69-
- uses: browser-actions/setup-firefox@fcf821c621167805dd63a29662bd7cb5676c81a8 # v1.7.1
69+
- uses: browser-actions/setup-firefox@0bc507ddf224827e3b1af68e014d5e42ab93e795 # v1.7.2
7070
if: matrix.browser == 'firefox'
7171
with:
7272
firefox-version: latest
@@ -89,7 +89,7 @@ jobs:
8989
env:
9090
E2E_FIXTURES: '1'
9191
- name: Run integration tests
92-
uses: cypress-io/github-action@c495c3ddffba403ba11be95fffb67e25203b3799 # v7.1.10
92+
uses: cypress-io/github-action@948d67d3074f1bbb6379c8bdbb04e95d2f8e593f # v7.4.0
9393
with:
9494
working-directory: packages/app
9595
install: false

.github/workflows/tests-unit.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ jobs:
2222
permissions:
2323
contents: read
2424
steps:
25-
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
26-
- uses: pnpm/action-setup@8912a9102ac27614460f54aedde9e1e7f9aec20d # v6.0.5
25+
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
26+
- uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6.0.8
2727
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
2828
with:
2929
node-version: '24'

AGENTS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ Authoritative total / active parameter counts for every model in the dashboard.
148148
| DeepSeek-V4-Pro | 1.6T | 49B | `deepseek-ai/DeepSeek-V4-Pro` | HF model card |
149149
| Kimi-K2.5 | 1T | 32B | `moonshotai/Kimi-K2.5` | HF model card |
150150
| Kimi-K2.6 | 1T | 32B | `moonshotai/Kimi-K2.6` | HF model card |
151+
| Kimi-K2.7-Code | 1T | 32B | `moonshotai/Kimi-K2.7-Code` | HF model card |
151152
| Qwen3.5-397B-A17B | 397B | 17B | `Qwen/Qwen3.5-397B-A17B` | HF model card |
152153
| GLM-5 | 744B | 40B | `zai-org/GLM-5` | HF model card |
153154
| GLM-5.1 | 744B | 40B | `zai-org/GLM-5.1-FP8` | HF model card (same base as GLM-5) |
@@ -161,7 +162,7 @@ Authoritative total / active parameter counts for every model in the dashboard.
161162
- **GLM-5 ≠ 355B.** 355B is GLM-4.5. GLM-5 jumped to 744B / 40B active (256-expert MoE with DSA).
162163
- **MiniMax-M2.5/M2.7 ≠ 456B.** 456B is the older MiniMax-Text-01 / M1 (32 large experts). The M2 series is a different architecture: 230B / 10B active, 256 small experts.
163164
- **DeepSeek-R1 is 671B, not 685B.** HF metadata shows 685B because the bundled MTP head adds ~14B; the core MoE is 671B / 37B active.
164-
- **Kimi K2.5 and K2.6 are post-training refinements**, not new pre-trained sizes. Same 1T / 32B / 384-expert backbone as the original K2.
165+
- **Kimi K2.5, K2.6, and K2.7-Code are post-training refinements**, not new pre-trained sizes. Same 1T / 32B / 384-expert backbone as the original K2. K2.7-Code is a coding-focused refinement of the same backbone.
165166

166167
## Common Development Tasks
167168

docs/adding-entities.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,105 @@ Present what you inferred and get confirmation + category in a single step. Incl
7171

7272
Everything else (`MODEL_OPTIONS`, `DEFAULT_MODELS`, `EXPERIMENTAL_MODELS`, `DEPRECATED_MODELS`, `MODEL_PREFIX_MAPPING`, `getModelLabel()`) is derived automatically.
7373

74+
**`packages/app/src/lib/compare-slug.ts`** (easy to miss — the /compare and /compare-per-dollar pages do NOT derive from `MODEL_CONFIG`):
75+
76+
- `COMPARE_MODEL_SLUGS` — add an entry with `{ slug, displayName, dbKeys, label }`. `displayName` must match the `Model` enum value; `dbKeys` lists the DB buckets to query. Place it per the ordering comment (Chinese-lab flagships first, newer family member leads). Without this entry the model is absent from /compare, /compare-per-dollar, the sitemap, and their OG images.
77+
- `COMPARE_MODEL_ALIASES` — only if a family-level or older-version slug should 308 to the new entry.
78+
79+
**`packages/app/src/lib/compare-ssr.ts`**:
80+
81+
- `KNOWN_MODELS` — add the display name so `?g_model=` URL overrides validate on compare pages.
82+
83+
**`packages/app/src/app/compare/page.tsx`** and **`packages/app/src/app/compare-per-dollar/page.tsx`**:
84+
85+
- `DESCRIPTION` — these SEO meta strings hardcode a sample model list ("…, Qwen 3.5 397B-A17B, and more"). Add the new model if it should appear in the catalog blurb.
86+
87+
**`packages/app/src/lib/model-architectures.ts`** (optional — powers the per-model architecture diagram on the inference tab):
88+
89+
- `MODEL_ARCHITECTURES` — add a `[Model.X]` entry with verified config.json values. Omitted models simply render no diagram (`getModelArchitecture` returns `undefined`), so this is non-blocking but expected for parity with other models.
90+
91+
`/about` needs no change — its model list derives from `DB_MODEL_TO_DISPLAY` and includes the new key automatically once `models.ts` is updated.
92+
93+
---
94+
95+
## Featuring a Day-0 Model
96+
97+
When a new model launches and we want to give it the headline treatment, swap the **promotion surfaces** to it. This is separate from [Adding a New Model](#adding-a-new-model) above — the model must **already exist** (`Model.*` enum, `MODEL_CONFIG`, DB mapping) before it can be featured. The promotion surfaces are:
98+
99+
- **Launch banner** — the dismissible bar at the top of the landing page
100+
- **Launch modal** — the "X is live" popup on the landing page
101+
- **Quick Comparisons preset** — the "X — First Look" card (first entry in `FAVORITE_PRESETS`)
102+
- **Default model** (optional) — the model the dashboard opens on (`g_model`)
103+
104+
### The "retire old, new IDs" pattern
105+
106+
Each launch **replaces** the previous day-0 model's surfaces rather than editing them in place. This is deliberate:
107+
108+
- **New storage keys** (`inferencex-<slug>-{banner,modal}-dismissed`) so users who dismissed the _previous_ launch banner/modal still see the new one.
109+
- **Keep the old preset, hide it** (`hidden: true`) instead of deleting it — existing `?preset=<old-slug>-launch` links (old banners, modals, external shares, blog `DashboardCTA`s) must keep resolving.
110+
- **Generic testIds** (`launch-banner`, `launch-modal`) — launch-agnostic so Cypress selectors don't change every launch.
111+
112+
> The current day-0 model is **whatever the single visible (`hidden` unset) `*-launch` preset points to** — detect it, don't assume. As of MiniMax M3 it was DeepSeek V4 Pro.
113+
114+
### Derive the identifiers
115+
116+
From the model name, derive (MiniMax M3 shown as the worked example):
117+
118+
| Token | Example | Used in |
119+
| --------- | ------------------ | ---------------------------------------------- |
120+
| `SLUG` | `minimax-m3` | preset id, nudge ids, storage keys, `?preset=` |
121+
| `SLUG_` | `minimax_m3` | analytics event names |
122+
| `ENUM` | `Model.MiniMax_M3` | preset `config.model` |
123+
| `DISPLAY` | `MiniMax M3` | all user-facing copy |
124+
| `G_MODEL` | `MiniMax-M3` | `g_model` default (the `Model.*` string value) |
125+
126+
### Then apply
127+
128+
**`packages/app/src/components/favorites/favorite-presets.ts`**:
129+
130+
1. On the outgoing visible `*-launch` preset, add `hidden: true` and update its comment (retired, kept for link compat — same pattern as the existing `dsv4-launch-nvidia` entry).
131+
2. Prepend a new visible preset as the **first** element of `FAVORITE_PRESETS`:
132+
```ts
133+
{
134+
id: 'SLUG-launch',
135+
title: 'DISPLAY — First Look',
136+
description:
137+
'First benchmarks of DISPLAY across every available GPU. New configurations appear here as they come online.',
138+
tags: ['<Vendor>', '<Version>', 'New'], // e.g. ['MiniMax', 'M3', 'New']
139+
category: 'comparison',
140+
wide: true,
141+
config: {
142+
model: ENUM,
143+
sequence: Sequence.EightK_OneK,
144+
precisions: ['fp4', 'fp4fp8', 'fp8'],
145+
yAxisMetric: 'y_tpPerGpu',
146+
hwFilter: ['h100', 'h200', 'b200', 'b300', 'gb200', 'gb300', 'mi300x', 'mi325x', 'mi355x'],
147+
},
148+
}
149+
```
150+
Narrow `hwFilter` only for a restricted launch (e.g. NVIDIA-only). The broad filter + "as they come online" copy is the intended self-filling behavior even when data is still partial at launch.
151+
152+
**`packages/app/src/lib/nudges/registry.tsx`** — rewrite the two launch nudges (only one banner + one modal exist at a time):
153+
154+
- **Modal** (under "Landing modals"): `id: 'SLUG-launch-modal'`, `storageKey: 'inferencex-SLUG-modal-dismissed'`, `title: 'DISPLAY is live'`, day-zero `description`, `testId: 'launch-modal'`, `primaryAction.onClick``/inference?preset=SLUG-launch`, analytics `SLUG_modal_shown`/`_dismissed`/`_explored`.
155+
- **Banner** (under "Landing banner"): `id: 'SLUG-launch-banner'`, `storageKey: 'inferencex-SLUG-banner-dismissed'`, `title: 'DISPLAY benchmarks are live'`, `testId: 'launch-banner'`, `href`/`onLinkClick``/inference?preset=SLUG-launch`, keep the generic `launch_banner_*` analytics events but set `properties: { banner_id: 'SLUG-launch', preset_id: 'SLUG-launch' }`.
156+
157+
**`packages/app/src/lib/url-state.ts`** _(only if making it the site default)_:
158+
159+
- Set `PARAM_DEFAULTS.g_model` to `'G_MODEL'`. Most launches **leave this unchanged** — only change it for a true flagship (DeepSeek V4 Pro got it; MiniMax M3 did not).
160+
161+
### Sync tests
162+
163+
- **`packages/app/src/lib/nudges/registry.test.ts`** — update the **sorted** expected-ids array ("contains the expected set of migrated nudges") to the new `SLUG-launch-banner`/`SLUG-launch-modal` ids.
164+
- **`packages/app/cypress/e2e/nudge-system.cy.ts`** and **`navigation.cy.ts`** — replace the old `inferencex-<old-slug>-{modal,banner}-dismissed` storage keys with the new ones. TestId selectors stay generic (`launch-modal`, `launch-banner`); update any `it(...)` titles that name the old model.
165+
- **`packages/app/src/lib/url-state.test.ts`** _(only if the default changed)_ — two specs hardcode the default `g_model`; update both.
166+
167+
> **Don't touch:** blog MDX `?g_model=…` / `?preset=<old-slug>-launch` links (historical, correct), `packages/constants/src/models.ts` DB-key maps, or the outgoing model's data-mapping / architecture entries — it still exists, it's just no longer the headline.
168+
169+
### Verify
170+
171+
`pnpm typecheck && pnpm lint && pnpm fmt && pnpm test:unit`, then `rg` for the old slug to confirm only the intentional hidden preset + blog links remain. Final gate: `pnpm test:e2e` and a manual `pnpm dev` check that the banner/modal/preset read `DISPLAY` and `/inference?preset=SLUG-launch` renders data.
172+
74173
---
75174

76175
## Adding a New GPU

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Design rationale and non-obvious conventions. See [CLAUDE.md](../CLAUDE.md) for
1010
- [Pitfalls](./pitfalls.md) — Failure modes: token type consistency, schema evolution, empty objects, zoom loss, stale closures, disaggregated metrics, negative splines, date stamping, ref stability, cost inheritance
1111
- [GPU Specs](./gpu-specs.md) — Unit conventions, topology invariants, SVG layout rationale, hardware gotchas
1212
- [TCO Calculator](./tco-calculator.md) — Why interpolation, composite keys, cost matrix, token type bugs, badge logic, state design
13-
- [Adding Entities](./adding-entities.md) — Step-by-step checklists for adding new models, GPUs, precisions, sequences, frameworks (ingest + constants + frontend)
13+
- [Adding Entities](./adding-entities.md) — Step-by-step checklists for adding new models, GPUs, precisions, sequences, frameworks (ingest + constants + frontend), plus featuring a day-0 model (launch banner, modal, Quick Comparisons preset)
1414
- [Testing](./testing.md) — Requirements, quality standards, pre-commit checklist
1515
- [Data Transforms](./data-transforms.md) — Full pipeline from BenchmarkRow to RenderableGraph: type hierarchy, hardware key construction, derived metrics, memoization strategy
1616
- [State Ownership](./state-ownership.md) — Which context owns which state, availability filtering cascade, comparison date mechanics, URL param sync

0 commit comments

Comments
 (0)