Collect kernel artifacts: device-IR node-link dump (.ir.jsonl)#2750
Draft
IshanAryendu wants to merge 14 commits into
Draft
Collect kernel artifacts: device-IR node-link dump (.ir.jsonl)#2750IshanAryendu wants to merge 14 commits into
IshanAryendu wants to merge 14 commits into
Conversation
…he decorator already computed for sample_id and record it per row, completing the kernel-artifact set (source, input shapes, decorator).
Add helion/autotuner/ir_features.extract_ir_graph: a lossless, plain-JSON node-link dump of Helion's device IR (networkx node_link_graph compatible), for the kernel-artifact cost-model dataset. - data edges from node.args/kwargs (with arg_positions) - region edges only from live control-flow call nodes (_for_loop/_if/_while), mapping live args onto child placeholders -- no fabricated reduction edges - reduction loops captured as typed graph-level metadata (rolled_reductions) - per-node fields best-effort/null; symbolic + concrete shapes - warns once on an unknown region_kind to catch upstream IR shifts Tests cover pointwise/reduction/control-flow kernels, the networkx round-trip, the _for_loop region-edge resolution, and the unknown-region_kind warning.
…2-3) Wire the Phase 1 extractor into autotune telemetry: - AutotuneLogSink gains an optional ir_graph and appends one node-link record per run to <base>.ir.jsonl (append mode, like .meta.jsonl) - autotune_logging() threads ir_graph through - BaseSearch._prepare extracts the device IR once per run (config-independent), best-effort and gated on autotune_log; never breaks autotuning - sink test asserts .ir.jsonl joins .meta.jsonl and the CSV rows on run_id - docs: document the .ir.jsonl sidecar
test_ir_features: helper edge cases (missing/scalar val, no lowering, address-free target_str, unconvertible concrete dim, non-control-flow region_specs), extract on empty device IR, and warn-once on unknown region_kind. test_kernel_metadata: empty-identity run_id still derived/stable, AutotuneLogEntry default decorator empty, sink with metadata but no ir_graph writes no .ir.jsonl, no-metadata writes neither sidecar, and record-after-close is a no-op.
P1 correctness/resource: - _region_specs length-guards node.args (best-effort [], no IndexError) - guard input_fake_tensors; cap stringified value/source_loc (_MAX_VALUE_LEN) - precompute per-node val features once and reuse on edges (O(N) not O(N+E)) - fix docstring typo P2 robustness/drift: - warn-once via functools.cache (thread-safe, no module-global mutable set) - _input_edges uses fx map_arg (covers all fx container types); documented arg_positions semantics - add HelperFunctionGraphInfo to known region kinds (found by the new drift test) - drift test asserts _KNOWN_REGION_KINDS == concrete GraphInfo subclasses P3 API/typing: - TypedDict schema (IrNode/IrEdge/IrGraphMeta/IrGraphRecord); typed return - drop duplicated top-level kernel_id (keep run_id join key; kernel_id in graph) - document failure modes / best-effort contract Tests: region_specs short-args, input_edges multiplicity+nesting, value cap, warn-once via cache_clear, region-kind drift guard.
- typed helper returns (ValFeatures/LoweringFeatures) so node/edge TypedDict literals type-check (clears 17 bad-typed-dict-key errors) - _region_specs narrows child graph ids with isinstance(int) (clears 4 bad-return/assignment errors and avoids a wrong edge from a malformed node) - propagate IrGraphRecord type through autotune_logging / AutotuneLogSink / BaseSearch._extract_ir_graph (replacing dict[str, object]) - add schema_version field (IR_SCHEMA_VERSION=1) for forward-compat pyrefly: 0 errors across ir_features.py/logger.py/base_search.py.
…ract - IrNode composed from _NodeCore + ValFeatures + LoweringFeatures via TypedDict inheritance (schema declared once, no field duplication/drift) - _is_graph_id TypeGuard uses type(x) is int -> rejects bool, narrows for pyrefly - warn-once on a control-flow node with unexpected args (was a silent []), surfacing IR-shape drift; non-control-flow nodes stay quiet - __all__ export list; document schema_version contract + consumer .get() gate - tests: bool rejection, malformed/normal region-spec warn behavior, IrNode composition keys, schema_version present/current + gate pyrefly 0 errors; ruff clean; 41 tests pass.
CI runs networkx < 3.4, where node_link_graph()'s edge-key parameter is link= (renamed to edges= in 3.4), so the tests' edges="links" call raised TypeError. Route both round-trip call sites through a _node_link_graph() helper that tries edges="links" (networkx >= 3.4) and falls back to link="links" (networkx < 3.4); both reference our "links" edge key. Document the < 3.4 consumer parameter in the module docstring.
networkx is not a declared Helion dependency (it happens to be present transitively via torch). Guard the two node_link_graph round-trip tests with @unittest.skipUnless so they skip rather than error in an environment without networkx; the other extractor tests are unaffected.
bef5db5 to
79c7fb0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Collect kernel artifacts: device-IR node-link dump (
.ir.jsonl)Summary
Adds a lossless dump of Helion's device IR to the autotune telemetry, as the
ir_featuresartifact for the kernel-artifact dataset. For each autotune run, the union of all device-sidetorch.fxgraphs is serialized as a node-link graph (one JSON record per run) to a new<autotune_log>.ir.jsonlsidecar, alongside the existing.csvand.meta.jsonl, and joined to them onrun_id.Motivation
The cost-model dataset needs the kernel's computation graph as features, not just source text + perf. Device IR is the structured representation Helion lowers to before config-specific codegen, so it's the right artifact to capture. It is config-independent (built once per bound kernel), so it is collected once per
run_idand joins cleanly to the per-config CSV rows and the per-run identity record.What it does
helion/autotuner/ir_features.pywithextract_ir_graph(device_ir, *, run_id, kernel_id, kernel_name, input_shapes) -> IrGraphRecord— a plain-JSON, networkx-node-link-compatible dump (can be parsed using:nx.node_link_graph(record, edges="links")).dataedges fromnode.args/kwargs(witharg_positions, via fxmap_argfor covering all container types).regionedges only from live control-flow call nodes (_for_loop/_for_loop_step/_if/_while_loop), mapping their live argument lists onto the child graph's placeholders.rolled_reductions) instead.Nonewhen absent):op_kind,target(address-free),lowering_class,dtype/shape/concrete_shape/stride/concrete_stride/device/value, operandinput_dtypes/input_shapes,reduction_type/reduction_ranges,pointwise_ranges,block_ids,graph_id,region_kind,source_loc. Symbolic dims are stringified with best-effort concrete resolution and large values are length-capped.TypedDicts (IrNodecomposed fromValFeatures+LoweringFeatures+ node-core;IrEdge,IrGraphMeta,IrGraphRecord) plus aschema_versionfield (IR_SCHEMA_VERSION = 1) and a documented consumer gate (record.get("schema_version", 0)).AutotuneLogSinkappends one.ir.jsonlrecord per run.BaseSearch._prepareextracts the device IR once, gated onautotune_logand best-effort (a missing/oddhost_functionor non-Triton backend degrades to "no IR artifact" and never breaks autotuning).GraphInforegion kinds and malformed control-flow nodes log a one-time warning (never raise). A unit test asserts the known-region-kinds set matches the concreteGraphInfosubclasses, so upstream IR changes fail CI loudly.Resulting artifact (per
run_id)Join to
.meta.jsonland.csvonrun_id; group bykernel_id.Testing
test/test_ir_features.py: pointwise/reduction/control-flow extraction on real kernels, networkx round-trip, control-flow region-edge resolution, rolled-reduction metadata (no fake edges), and negative/edge cases (missing/scalar val, no lowering, address-free target, unconvertible dim, malformed/non-control-flow region specs warn behavior, empty device IR, bool graph-id rejection, value/source_loccapping,schema_versiongate, region-kind drift guard).test/test_kernel_metadata.py:.ir.jsonlis appended and joins.meta.jsonl+ CSV rows onrun_id; metadata-without-ir-graph writes no.ir.jsonl.Validation
helion/(CI excludestest/).test_ir_features+test_kernel_metadata+test_autotunerall pass (no autotuner regressions).add/softmax/attention: all three sidecars produced, networkx rebuilds every record,run_idjoins.ir.jsonl ↔ .meta.jsonl ↔ .csv, with correct per-kernel structure (pointwise = data edges only; softmax/layer_norm =rolled_reductions, no fake edges; bmm/attention = resolved_for_loopregion edges).