amdgpu: make the AQL queue doorbell kind-aware#56
Open
zjgarvey wants to merge 4 commits into
Open
Conversation
Nightly ROCm on MI300X returns a USER-kind doorbell signal (not DOORBELL-kind), so the cached hardware_doorbell_ptr is a signal value rather than a writable MMIO address; the raw atomic store to it faulted, SEGV-ing every H2D copy and kernel dispatch. Keep the raw-MMIO fast path for DOORBELL-kind doorbells; fall back to hsa_signal_store_screlease for USER/interrupt-kind. Peeled from the nightly-bring-up branch for isolated review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
benvanik
requested changes
Jun 9, 2026
…ibhsa to struct top Addresses benvanik's review comments. Also clang-format and generalize the doorbell-kind comments away from local bring-up context. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds runtime/src/iree/hal/drivers/amdgpu/util/aql_doorbell.md: what the AQL doorbell is, why nightly ROCm/MI300X hands back a USER-kind doorbell signal (the union slot is a signal value, not an MMIO pointer) so the raw store faulted, and how the kind-aware ring fixes it. Every claim cites the code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: zjgarvey <zjgarvey@gmail.com>
This file was a local markdown file I was using to gain context, and not meant to be committed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Nightly ROCm on MI300X returns a USER-kind queue doorbell signal (not DOORBELL-kind), so the cached
hardware_doorbell_ptrunion slot is a signal value, not a writable MMIO address; the raw atomic store to it faulted — every H2D copy / kernel dispatch SEGV'd. Keep the raw-MMIO fast path for DOORBELL-kind; fall back tohsa_signal_store_screlease(system-scope release) for USER/interrupt-kind.Review
Agent review confirmed the HSA release-ordering correctness and that the fast path is byte-for-byte preserved, and caught a [blocker]: the
aql_ring_initializesignature change left a second caller (pm4_dispatch_live_test.cc) on the old form — fixed (so the tests-enabled build compiles). Doc comment on the now-conditionally-NULL doorbell field corrected.Test
Paired hip-cts:
iree-org/hip-ctsusers/zjgarvey/aql-doorbell-kind-test(32 H2D/D2H roundtrips + data integrity ring the doorbell repeatedly). Verified on MI300X (98 assertions).🤖 Generated with Claude Code