Skip to content

Lean kernel-artifact telemetry for cost-model dataset#2770

Open
IshanAryendu wants to merge 6 commits into
pytorch:mainfrom
IshanAryendu:collect-kernel-artifacts-lean
Open

Lean kernel-artifact telemetry for cost-model dataset#2770
IshanAryendu wants to merge 6 commits into
pytorch:mainfrom
IshanAryendu:collect-kernel-artifacts-lean

Conversation

@IshanAryendu

@IshanAryendu IshanAryendu commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Tightens the autotuner's kernel-artifact telemetry into a clean dataset for an offline cost model (predict a config's performance for a kernel without running it). Three changes:

  1. Only collect genuine searches. Runs restricted to user-pinned configs=[...] (without force_autotune) tune between the user's configs rather than searching the space, so their rows are a biased, non-representative slice. These runs are now excluded from collection entirely.
  2. Record codegen context. The full helion.settings is stored in the per-run .meta.jsonl record (JSON-safe).
  3. Lean, non-redundant schema. Drop the derivable IDs kernel_id/sample_id and the redundant decorator; keep run_id as the sole join key and derive it directly from content; store the per-config config as minimized, compact JSON.

Every row in the dataset must be a representative (kernel, config) → measured perf sample, and the files should carry only what a consumer can't trivially derive. The prior schema (a) emitted rows for config-restricted runs, biasing the training distribution; (b) kept no record of the codegen-affecting settings as analyzable fields; and (c) carried three content-hash IDs where only one (run_id, the join key) is structurally needed, plus a decorator string that duplicates config + settings.

Tests: test/test_kernel_metadata.py + test/test_autotuner.py (123 passed).

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 11, 2026
@IshanAryendu IshanAryendu force-pushed the collect-kernel-artifacts-lean branch from 9abb150 to 1cd901e Compare June 11, 2026 23:32
@IshanAryendu IshanAryendu force-pushed the collect-kernel-artifacts-lean branch from ef76d6f to bedcb75 Compare June 12, 2026 00:34
@pytorch pytorch deleted a comment from claude Bot Jun 12, 2026
@IshanAryendu

Copy link
Copy Markdown
Contributor Author

@claude

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @IshanAryendu's task in 7m 8s —— View job


Code Review: Lean kernel-artifact telemetry

  • Read the PR diff against origin/main
  • Review changed files for correctness, schema design, and edge cases
  • Post review feedback

Overall this is a clean, well-reasoned refactor. The schema is genuinely leaner, the codegen_decorator_parts single-source-of-truth extraction is a nice touch, the append-mode + JSONL change is coherent, and the test coverage is thorough (header pinning, round-trip, run_id derivation, restricted-search exclusion, multi-run accumulation). A few observations below — mostly notes/questions, one stale comment to fix.


1. run_id can map to multiple distinct .meta.jsonl records (design note)

run_id hashes only the codegen-affecting subset of settings (static_shapes, index_dtype via _codegen_signaturecodegen_decorator_parts), but the meta record stores the full settings dict. So two runs of the same kernel/shape/dtype/hardware that differ only in a non-codegen setting (e.g. autotune_random_seed, autotune effort) produce:

  • the same run_id, but
  • different .meta.jsonl records (the settings field differs).

That weakens the documented "each CSV row joins to exactly one .meta.jsonl record (a clean many-to-one)" invariant: a consumer joining on run_id against an accumulated sidecar can find two records for one key, and the recommended "de-duplicate on run_id" then silently drops one settings variant.

For a cost model predicting perf from (kernel, config), collapsing non-codegen settings is arguably intended (they shouldn't change the label) — so this may be by design. If so, a one-line note in the KernelMetadata docstring clarifying that run_id deliberately ignores non-codegen settings (and that dedup-on-run_id may collapse differing settings records) would prevent a downstream consumer from being surprised. helion/autotuner/metrics.py:81-95

2. Stale comment in benchmark.yml referencing the removed kernel_id

# --autotune-metrics-json captures the per-run kernel-level summary
# (kernel_id, source, shapes, hardware) and is overwrite-safe on its own.

kernel_id was removed from AutotuneMetrics.to_dict() in this same PR (helion/autotuner/metrics.py:46-59), so this comment now names a field that no longer exists. Suggest (kernel_name, source, shapes, hardware). .github/workflows/benchmark.yml:204

Note: I can't edit files under .github/workflows/ (GitHub App workflow-modification restriction), so this one needs a manual tweak.

3. CSV/JSONL header-detection is a TOCTOU under concurrent writers (acceptable, worth a note)

write_header = not self.csv_path.exists() or self.csv_path.stat().st_size == 0
self._csv_file = self.csv_path.open("a", ...)

The existence/size check happens before the append-mode open, so two processes sharing one base path could both decide to write a header (or interleave data rows mid-row). The docstring/docs already scope this to single-process accumulation and per-kernel paths in benchmark.yml, so this is fine for the intended use — just flagging that the "share one base path" language in the docs could be read as multi-process-safe, which it isn't. helion/autotuner/logger.py:328-345

4. Minor: recording-gated minimization is a real behavior change (looks correct)

Config minimization now only runs when self.log.recording is true, vs. the old code that always built entries. This is a sound optimization and behaviour is preserved (the original config is still what gets benchmarked at benchmark_provider.py:824). The minimized is not None ⇔ recording dual-use as both a guard and a type-narrowing for AutotuneLogEntry(config=...) is clever but slightly subtle — the inline comment explains it well, so no change needed. helion/autotuner/benchmark_provider.py:806-835

5. Nits

  • _is_restricted_search best-effort fallback (missing kernel/configs → "not restricted") is the right default and the docstring is excellent. ✅
  • The \x00-delimited run_id payload to avoid boundary collisions is a good detail. ✅
  • Reconstruction contract {**config_defaults, **row_config} is consistent: minimize and the stored config_defaults both derive from config_spec.default_config(), and minimize always keeps the required block_sizes. ✅

No blocking issues — items 1 and 3 are documentation/clarity, item 2 is a stale comment in a file I can't modify. Nice work on the test suite.
collect-kernel-artifacts-lean

- Skip data collection for searches restricted to user-pinned configs
  (configs=[...] without force_autotune): no .meta.jsonl/.csv written.
- Store full helion.settings in the per-run .meta.jsonl record (JSON-safe).
- Store per-run config_defaults (the ConfigSpec defaults) so a consumer can
  rebuild each minimized row's config via {**config_defaults, **row_config}.
- Drop kernel_id/sample_id/decorator; derive run_id directly from
  (kernel_source, codegen-settings signature, shapes, dtypes, hardware).
- Single source of truth for the codegen signature (codegen_decorator_parts),
  shared by format_kernel_decorator and run_id so they cannot drift.
- Store per-config 'config' as minimized, compact one-line JSON.
- Minimize/record only when a log sink is active.
- Remove now-unused Kernel.kernel_id()/_settings_signature().
@IshanAryendu IshanAryendu force-pushed the collect-kernel-artifacts-lean branch from 7d22677 to 69b7af1 Compare June 12, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant