Skip to content

amdgpu: accept raw-HSACO kernels with a partial implicit kernarg suffix#59

Draft
zjgarvey wants to merge 2 commits into
mainfrom
users/zjgarvey/fix/raw-hsaco-implicit-kernarg-suffix
Draft

amdgpu: accept raw-HSACO kernels with a partial implicit kernarg suffix#59
zjgarvey wants to merge 2 commits into
mainfrom
users/zjgarvey/fix/raw-hsaco-implicit-kernarg-suffix

Conversation

@zjgarvey

@zjgarvey zjgarvey commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

A PR#12 over-strict check rejected raw-HSACO kernels (notably MIOpen hand-written asm conv kernels) whose declared kernarg segment is smaller than the full implicit-args block the loader expects. MIOpen ignores the load failure and unloads the uninitialized handle -> SIGSEGV (conv2d). Relax the validation: the reservation is already max(kernarg_size, explicit, implicit_offset + IMPLICIT_ARGS_SIZE) and the dispatch path over-reserves + zero-fills, so a partial suffix is safe.

Review

Agent review: clean-to-merge. The OOB hypothesis was adversarially disproven — the kernarg reservation is the correct MAX, computed independently of the removed check, so the full implicit-args write is always in-bounds. Rebased cleanly onto current main.

Test

Paired hip-cts: iree-org/hip-cts users/zjgarvey/miopen-asmconv-kernarg-note — a hand-assembled gfx942 partial-kernarg kernel + load/dispatch harness (runs on MI300X), plus a note. The hand-written kernel does not yet reproduce the exact pre-fix MIOpen-unload SIGSEGV (brittle metadata); practical repro stays conv2d_min.py.

🤖 Generated with Claude Code

zjgarvey and others added 2 commits June 9, 2026 11:58
Workaround for a regression in the MERGED form of PR#12 (raw HSACO custom
kernargs): the new "truncated implicit kernarg suffix" check in
iree_hal_amdgpu_executable_raw_hsaco_custom_kernarg_layout() rejects MIOpen's
hand-written assembly conv kernels (e.g. miopenSp3AsmConv*), which declare a
hidden arg but size kernarg_segment to only the partial suffix they use. The
first GPU conv2d then fails hipModuleLoadDataEx; MIOpen unloads a garbage module
handle and the binding segfaults in module_release.

The check is redundant: the custom-direct dispatch path already reserves
max(kernarg_size, explicit, implicit_offset + IMPLICIT_SIZE) and zero-fills the
remainder, so a smaller kernarg_size is safe (the squashed form of #12 accepted
these kernels). Drop the rejection; keep the overflow guard.

Verified: full native-vs-HRX A/B 58/58 (record + compare); repro/conv2d_min.py
passes. Local workaround pending an upstream fix in #12 — see
hrx-pytorch-smoke-test/repro/ISSUE_conv2d_truncated_implicit_kernarg.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The layout change deliberately accepts hand-written kernels that declare a
partial hidden suffix; dispatch reserves max(kernarg_size, implicit offset +
implicit size). Update the stale expectation that failed the AMDGPU CI lanes.
Verified locally: executable_test 8/8 pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant