Lean kernel-artifact telemetry for cost-model dataset by IshanAryendu · Pull Request #2770 · pytorch/helion

IshanAryendu · 2026-06-11T23:23:27Z

Tightens the autotuner's kernel-artifact telemetry into a clean dataset for an offline cost model (predict a config's performance for a kernel without running it). Three changes:

Only collect genuine searches. Runs restricted to user-pinned configs=[...] (without force_autotune) tune between the user's configs rather than searching the space, so their rows are a biased, non-representative slice. These runs are now excluded from collection entirely.
Record codegen context. The full helion.settings is stored in the per-run .meta.jsonl record (JSON-safe).
Lean, non-redundant schema. Drop the derivable IDs kernel_id/sample_id and the redundant decorator; keep run_id as the sole join key and derive it directly from content; store the per-config config as minimized, compact JSON.

Every row in the dataset must be a representative (kernel, config) → measured perf sample, and the files should carry only what a consumer can't trivially derive. The prior schema (a) emitted rows for config-restricted runs, biasing the training distribution; (b) kept no record of the codegen-affecting settings as analyzable fields; and (c) carried three content-hash IDs where only one (run_id, the join key) is structurally needed, plus a decorator string that duplicates config + settings.

Tests: test/test_kernel_metadata.py + test/test_autotuner.py (123 passed).

…join key

…he decorator already computed for sample_id and record it per row, completing the kernel-artifact set (source, input shapes, decorator).

IshanAryendu · 2026-06-12T00:45:07Z

@claude

claude · 2026-06-12T00:45:27Z

Claude finished @IshanAryendu's task in 7m 8s —— View job

Code Review: Lean kernel-artifact telemetry

Read the PR diff against origin/main
Review changed files for correctness, schema design, and edge cases
Post review feedback

Overall this is a clean, well-reasoned refactor. The schema is genuinely leaner, the codegen_decorator_parts single-source-of-truth extraction is a nice touch, the append-mode + JSONL change is coherent, and the test coverage is thorough (header pinning, round-trip, run_id derivation, restricted-search exclusion, multi-run accumulation). A few observations below — mostly notes/questions, one stale comment to fix.

1. `run_id` can map to multiple distinct `.meta.jsonl` records (design note)

run_id hashes only the codegen-affecting subset of settings (static_shapes, index_dtype via _codegen_signature → codegen_decorator_parts), but the meta record stores the full settings dict. So two runs of the same kernel/shape/dtype/hardware that differ only in a non-codegen setting (e.g. autotune_random_seed, autotune effort) produce:

the same run_id, but
different .meta.jsonl records (the settings field differs).

That weakens the documented "each CSV row joins to exactly one .meta.jsonl record (a clean many-to-one)" invariant: a consumer joining on run_id against an accumulated sidecar can find two records for one key, and the recommended "de-duplicate on run_id" then silently drops one settings variant.

For a cost model predicting perf from (kernel, config), collapsing non-codegen settings is arguably intended (they shouldn't change the label) — so this may be by design. If so, a one-line note in the KernelMetadata docstring clarifying that run_id deliberately ignores non-codegen settings (and that dedup-on-run_id may collapse differing settings records) would prevent a downstream consumer from being surprised. helion/autotuner/metrics.py:81-95

2. Stale comment in `benchmark.yml` referencing the removed `kernel_id`

# --autotune-metrics-json captures the per-run kernel-level summary
# (kernel_id, source, shapes, hardware) and is overwrite-safe on its own.

kernel_id was removed from AutotuneMetrics.to_dict() in this same PR (helion/autotuner/metrics.py:46-59), so this comment now names a field that no longer exists. Suggest (kernel_name, source, shapes, hardware). .github/workflows/benchmark.yml:204

Note: I can't edit files under .github/workflows/ (GitHub App workflow-modification restriction), so this one needs a manual tweak.

3. CSV/JSONL header-detection is a TOCTOU under concurrent writers (acceptable, worth a note)

write_header = not self.csv_path.exists() or self.csv_path.stat().st_size == 0
self._csv_file = self.csv_path.open("a", ...)

The existence/size check happens before the append-mode open, so two processes sharing one base path could both decide to write a header (or interleave data rows mid-row). The docstring/docs already scope this to single-process accumulation and per-kernel paths in benchmark.yml, so this is fine for the intended use — just flagging that the "share one base path" language in the docs could be read as multi-process-safe, which it isn't. helion/autotuner/logger.py:328-345

4. Minor: `recording`-gated minimization is a real behavior change (looks correct)

Config minimization now only runs when self.log.recording is true, vs. the old code that always built entries. This is a sound optimization and behaviour is preserved (the original config is still what gets benchmarked at benchmark_provider.py:824). The minimized is not None ⇔ recording dual-use as both a guard and a type-narrowing for AutotuneLogEntry(config=...) is clever but slightly subtle — the inline comment explains it well, so no change needed. helion/autotuner/benchmark_provider.py:806-835

5. Nits

_is_restricted_search best-effort fallback (missing kernel/configs → "not restricted") is the right default and the docstring is excellent. ✅
The \x00-delimited run_id payload to avoid boundary collisions is a good detail. ✅
Reconstruction contract {**config_defaults, **row_config} is consistent: minimize and the stored config_defaults both derive from config_spec.default_config(), and minimize always keeps the required block_sizes. ✅

No blocking issues — items 1 and 3 are documentation/clarity, item 2 is a stale comment in a file I can't modify. Nice work on the test suite.
• collect-kernel-artifacts-lean

- Skip data collection for searches restricted to user-pinned configs (configs=[...] without force_autotune): no .meta.jsonl/.csv written. - Store full helion.settings in the per-run .meta.jsonl record (JSON-safe). - Store per-run config_defaults (the ConfigSpec defaults) so a consumer can rebuild each minimized row's config via {**config_defaults, **row_config}. - Drop kernel_id/sample_id/decorator; derive run_id directly from (kernel_source, codegen-settings signature, shapes, dtypes, hardware). - Single source of truth for the codegen signature (codegen_decorator_parts), shared by format_kernel_decorator and run_id so they cannot drift. - Store per-config 'config' as minimized, compact one-line JSON. - Minimize/record only when a log sink is active. - Remove now-unused Kernel.kernel_id()/_settings_signature().

IshanAryendu and others added 4 commits June 10, 2026 10:32

Collect kernel artifacts: append-mode autotune telemetry with run_id …

1295959

…join key

telemetry: collect decorator string as a structured CSV columnReuse t…

44334ef

…he decorator already computed for sample_id and record it per row, completing the kernel-artifact set (source, input shapes, decorator).

Merge branch 'pytorch:main' into collect-kernel-artifacts

bc91d52

Merge branch 'pytorch:main' into collect-kernel-artifacts

4c83a72

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 11, 2026

IshanAryendu force-pushed the collect-kernel-artifacts-lean branch from 9abb150 to 1cd901e Compare June 11, 2026 23:32

IshanAryendu requested review from angelayi and karthickai June 11, 2026 23:45

IshanAryendu force-pushed the collect-kernel-artifacts-lean branch from ef76d6f to bedcb75 Compare June 12, 2026 00:34

pytorch deleted a comment from claude Bot Jun 12, 2026

IshanAryendu force-pushed the collect-kernel-artifacts-lean branch from 7d22677 to 69b7af1 Compare June 12, 2026 00:57

Merge branch 'main' into collect-kernel-artifacts-lean

5b7de17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lean kernel-artifact telemetry for cost-model dataset#2770

Lean kernel-artifact telemetry for cost-model dataset#2770
IshanAryendu wants to merge 6 commits into
pytorch:mainfrom
IshanAryendu:collect-kernel-artifacts-lean

IshanAryendu commented Jun 11, 2026 •

edited

Loading

Uh oh!

IshanAryendu commented Jun 12, 2026

Uh oh!

claude Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

IshanAryendu commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IshanAryendu commented Jun 12, 2026

Uh oh!

claude Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Lean kernel-artifact telemetry

1. run_id can map to multiple distinct .meta.jsonl records (design note)

2. Stale comment in benchmark.yml referencing the removed kernel_id

3. CSV/JSONL header-detection is a TOCTOU under concurrent writers (acceptable, worth a note)

4. Minor: recording-gated minimization is a real behavior change (looks correct)

5. Nits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

IshanAryendu commented Jun 11, 2026 •

edited

Loading

claude Bot commented Jun 12, 2026 •

edited

Loading

1. `run_id` can map to multiple distinct `.meta.jsonl` records (design note)

2. Stale comment in `benchmark.yml` referencing the removed `kernel_id`

4. Minor: `recording`-gated minimization is a real behavior change (looks correct)