Skip to content

Collect kernel artifacts: device-IR node-link dump (.ir.jsonl)#2750

Draft
IshanAryendu wants to merge 14 commits into
pytorch:mainfrom
IshanAryendu:collect-ir-features
Draft

Collect kernel artifacts: device-IR node-link dump (.ir.jsonl)#2750
IshanAryendu wants to merge 14 commits into
pytorch:mainfrom
IshanAryendu:collect-ir-features

Conversation

@IshanAryendu

Copy link
Copy Markdown
Contributor

Collect kernel artifacts: device-IR node-link dump (.ir.jsonl)

Summary

Adds a lossless dump of Helion's device IR to the autotune telemetry, as the ir_features artifact for the kernel-artifact dataset. For each autotune run, the union of all device-side torch.fx graphs is serialized as a node-link graph (one JSON record per run) to a new <autotune_log>.ir.jsonl sidecar, alongside the existing .csv and .meta.jsonl, and joined to them on run_id.

Stacked on #2737 ("Collect kernel artifacts: append-mode autotune telemetry with run_id"). PR 2737 adds the append-mode sink, .meta.jsonl, and the run_id join key this one builds on.

Motivation

The cost-model dataset needs the kernel's computation graph as features, not just source text + perf. Device IR is the structured representation Helion lowers to before config-specific codegen, so it's the right artifact to capture. It is config-independent (built once per bound kernel), so it is collected once per run_id and joins cleanly to the per-config CSV rows and the per-run identity record.

What it does

  • New module helion/autotuner/ir_features.py with extract_ir_graph(device_ir, *, run_id, kernel_id, kernel_name, input_shapes) -> IrGraphRecord — a plain-JSON, networkx-node-link-compatible dump (can be parsed using: nx.node_link_graph(record, edges="links")).
  • Only-true edges are captured:
    • data edges from node.args/kwargs (with arg_positions, via fx map_arg for covering all container types).
    • region edges only from live control-flow call nodes (_for_loop/_for_loop_step/_if/_while_loop), mapping their live argument lists onto the child graph's placeholders.
    • Reduction loops are rolled alternates, not called subgraphs, so they are not given fabricated dataflow edges. That relationship is recorded as typed graph-level metadata (rolled_reductions) instead.
  • Per-node features (best-effort, None when absent): op_kind, target (address-free), lowering_class, dtype/shape/concrete_shape/stride/concrete_stride/device/value, operand input_dtypes/input_shapes, reduction_type/reduction_ranges, pointwise_ranges, block_ids, graph_id, region_kind, source_loc. Symbolic dims are stringified with best-effort concrete resolution and large values are length-capped.
  • Typed schema via TypedDicts (IrNode composed from ValFeatures + LoweringFeatures + node-core; IrEdge, IrGraphMeta, IrGraphRecord) plus a schema_version field (IR_SCHEMA_VERSION = 1) and a documented consumer gate (record.get("schema_version", 0)).
  • Wiring: AutotuneLogSink appends one .ir.jsonl record per run. BaseSearch._prepare extracts the device IR once, gated on autotune_log and best-effort (a missing/odd host_function or non-Triton backend degrades to "no IR artifact" and never breaks autotuning).
  • Drift safety: unknown GraphInfo region kinds and malformed control-flow nodes log a one-time warning (never raise). A unit test asserts the known-region-kinds set matches the concrete GraphInfo subclasses, so upstream IR changes fail CI loudly.

Resulting artifact (per run_id)

<base>.ir.jsonl   {schema_version, run_id, directed, multigraph,
                   graph{run_id, kernel_id, kernel_name, input_shapes, num_graphs,
                         root_ids, graphs[], rolled_reductions[]},
                   nodes[], links[]}

Join to .meta.jsonl and .csv on run_id; group by kernel_id.

Testing

  • test/test_ir_features.py: pointwise/reduction/control-flow extraction on real kernels, networkx round-trip, control-flow region-edge resolution, rolled-reduction metadata (no fake edges), and negative/edge cases (missing/scalar val, no lowering, address-free target, unconvertible dim, malformed/non-control-flow region specs warn behavior, empty device IR, bool graph-id rejection, value/source_loc capping, schema_version gate, region-kind drift guard).
  • test/test_kernel_metadata.py: .ir.jsonl is appended and joins .meta.jsonl + CSV rows on run_id; metadata-without-ir-graph writes no .ir.jsonl.

Validation

  • ruff clean; pyrefly 0 errors on helion/ (CI excludes test/).
  • test_ir_features + test_kernel_metadata + test_autotuner all pass (no autotuner regressions).
  • End-to-end on add/softmax/attention: all three sidecars produced, networkx rebuilds every record, run_id joins .ir.jsonl ↔ .meta.jsonl ↔ .csv, with correct per-kernel structure (pointwise = data edges only; softmax/layer_norm = rolled_reductions, no fake edges; bmm/attention = resolved _for_loop region edges).

…he decorator already computed for sample_id and record it per row, completing the kernel-artifact set (source, input shapes, decorator).
Add helion/autotuner/ir_features.extract_ir_graph: a lossless, plain-JSON
node-link dump of Helion's device IR (networkx node_link_graph compatible),
for the kernel-artifact cost-model dataset.

- data edges from node.args/kwargs (with arg_positions)
- region edges only from live control-flow call nodes (_for_loop/_if/_while),
  mapping live args onto child placeholders -- no fabricated reduction edges
- reduction loops captured as typed graph-level metadata (rolled_reductions)
- per-node fields best-effort/null; symbolic + concrete shapes
- warns once on an unknown region_kind to catch upstream IR shifts

Tests cover pointwise/reduction/control-flow kernels, the networkx round-trip,
the _for_loop region-edge resolution, and the unknown-region_kind warning.
…2-3)

Wire the Phase 1 extractor into autotune telemetry:
- AutotuneLogSink gains an optional ir_graph and appends one node-link record
  per run to <base>.ir.jsonl (append mode, like .meta.jsonl)
- autotune_logging() threads ir_graph through
- BaseSearch._prepare extracts the device IR once per run (config-independent),
  best-effort and gated on autotune_log; never breaks autotuning
- sink test asserts .ir.jsonl joins .meta.jsonl and the CSV rows on run_id
- docs: document the .ir.jsonl sidecar
test_ir_features: helper edge cases (missing/scalar val, no lowering,
address-free target_str, unconvertible concrete dim, non-control-flow
region_specs), extract on empty device IR, and warn-once on unknown region_kind.

test_kernel_metadata: empty-identity run_id still derived/stable, AutotuneLogEntry
default decorator empty, sink with metadata but no ir_graph writes no .ir.jsonl,
no-metadata writes neither sidecar, and record-after-close is a no-op.
P1 correctness/resource:
- _region_specs length-guards node.args (best-effort [], no IndexError)
- guard input_fake_tensors; cap stringified value/source_loc (_MAX_VALUE_LEN)
- precompute per-node val features once and reuse on edges (O(N) not O(N+E))
- fix docstring typo

P2 robustness/drift:
- warn-once via functools.cache (thread-safe, no module-global mutable set)
- _input_edges uses fx map_arg (covers all fx container types); documented
  arg_positions semantics
- add HelperFunctionGraphInfo to known region kinds (found by the new drift test)
- drift test asserts _KNOWN_REGION_KINDS == concrete GraphInfo subclasses

P3 API/typing:
- TypedDict schema (IrNode/IrEdge/IrGraphMeta/IrGraphRecord); typed return
- drop duplicated top-level kernel_id (keep run_id join key; kernel_id in graph)
- document failure modes / best-effort contract

Tests: region_specs short-args, input_edges multiplicity+nesting, value cap,
warn-once via cache_clear, region-kind drift guard.
- typed helper returns (ValFeatures/LoweringFeatures) so node/edge TypedDict
  literals type-check (clears 17 bad-typed-dict-key errors)
- _region_specs narrows child graph ids with isinstance(int) (clears 4
  bad-return/assignment errors and avoids a wrong edge from a malformed node)
- propagate IrGraphRecord type through autotune_logging / AutotuneLogSink /
  BaseSearch._extract_ir_graph (replacing dict[str, object])
- add schema_version field (IR_SCHEMA_VERSION=1) for forward-compat

pyrefly: 0 errors across ir_features.py/logger.py/base_search.py.
…ract

- IrNode composed from _NodeCore + ValFeatures + LoweringFeatures via TypedDict
  inheritance (schema declared once, no field duplication/drift)
- _is_graph_id TypeGuard uses type(x) is int -> rejects bool, narrows for pyrefly
- warn-once on a control-flow node with unexpected args (was a silent []),
  surfacing IR-shape drift; non-control-flow nodes stay quiet
- __all__ export list; document schema_version contract + consumer .get() gate
- tests: bool rejection, malformed/normal region-spec warn behavior, IrNode
  composition keys, schema_version present/current + gate

pyrefly 0 errors; ruff clean; 41 tests pass.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 11, 2026
CI runs networkx < 3.4, where node_link_graph()'s edge-key parameter is
link= (renamed to edges= in 3.4), so the tests' edges="links" call raised
TypeError. Route both round-trip call sites through a _node_link_graph()
helper that tries edges="links" (networkx >= 3.4) and falls back to
link="links" (networkx < 3.4); both reference our "links" edge key.
Document the < 3.4 consumer parameter in the module docstring.
networkx is not a declared Helion dependency (it happens to be present
transitively via torch). Guard the two node_link_graph round-trip tests with
@unittest.skipUnless so they skip rather than error in an environment without
networkx; the other extractor tests are unaffected.
@jansel jansel requested review from choijon5 and ethche June 11, 2026 05:47
@IshanAryendu IshanAryendu marked this pull request as draft June 11, 2026 16:37
@IshanAryendu IshanAryendu force-pushed the collect-ir-features branch from bef5db5 to 79c7fb0 Compare June 11, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant