SDMA engine selection#429
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the Anvil SDMA transport to select SDMA engines using KFD’s per-link recommended_sdma_engine_id_mask (via topology io_link properties), spreading channels across the recommended engine(s) and falling back to the existing MI300X static OAM map when no recommendation is available.
Changes:
- Adds KFD-driven engine mask query (
getRecommendedEngineMask) and uses it inconnect()to round-robin channels across recommended engines. - Adjusts SDMA queue creation parameters to
QueuePercentage=100andHSA_QUEUE_PRIORITY_MAXIMUM. - Hardens
SdmaQueueteardown by converting the destructor to a function-try-block that logs and terminates on teardown failure.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/application/transport/sdma/anvil.cpp | Implements KFD mask lookup, channel-to-engine spreading, updated queue creation params, and destructor try/catch teardown handling. |
| include/mori/application/transport/sdma/anvil.hpp | Declares new helper APIs and introduces a SdmaQueueVector alias for the channel map value type. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
jhchouuu
left a comment
There was a problem hiding this comment.
LGTM
Thanks for your work, the direction is right and this is good to merge. only some comments.
| CHECK_HSAKMT_SUCCESS(hsaKmtCreateQueueExt(localNodeId, HSA_QUEUE_SDMA_BY_ENG_ID, | ||
| DEFAULT_QUEUE_PERCENTAGE, DEFAULT_PRIORITY, engineId, | ||
| queueBuffer_, SDMA_QUEUE_SIZE, nullptr, &queue_), | ||
| "Failed"); | ||
| CHECK_HSAKMT_SUCCESS( | ||
| hsaKmtCreateQueueExt(localNodeId, HSA_QUEUE_SDMA_BY_ENG_ID, 100, HSA_QUEUE_PRIORITY_MAXIMUM, | ||
| engineId, queueBuffer_, SDMA_QUEUE_SIZE, nullptr, &queue_), | ||
| "Failed"); |
There was a problem hiding this comment.
what's the motivation for bumping the SDMA queue priority from NORMAL (0) to MAXIMUM (3) here? Just want to make sure we understand the impact on other queues (compute / other SDMA) sharing the GPU. Was this something you saw help in practice?
There was a problem hiding this comment.
Yes, this only affects SDMA queues: I took it from ROCR runtime which also uses max priority.
QueuePercentage = 100
This is the max share of the engine's scheduling quantum the queue may consume; 100 means "no artificial throttle." It's passed straight through to KFD. In practice it does not give you fine-grained SDMA bandwidth control — partial values just cap the queue's scheduling slice, and every real user (ROCr included) passes 100. So 100 is the right value; anything lower would only throttle yourself.
Priority = HSA_QUEUE_PRIORITY_MAXIMUM
The thunk remaps the enum (-3..3) into KFD's 0–15 range:
static uint32_t priority_map[] = {0, 3, 5, 7, 9, 11, 15};
So MAXIMUM (3) → KFD priority 15, vs NORMAL (0) → 7. The kernel hardware scheduler uses this only for arbitration ordering among queues contending for the same SDMA engine.
Two things worth keeping in mind:
Priority is relative. It only helps you win against lower-priority queues. Since ROCr's internal copy queues are also MAX, you won't out-prioritize them — you'll be on par (which is the intended behavior).
| SdmaQueue::~SdmaQueue() { | ||
| // TODO catch exception? | ||
| SdmaQueue::~SdmaQueue() try { | ||
| CHECK_HSAKMT_SUCCESS(hsaKmtDestroyQueue(queue_.QueueId), "Failed to destroy queue."); | ||
| CHECK_HIP_ERROR(hipFree(deviceHandle_)); | ||
| CHECK_HIP_ERROR(hipFree(cachedWptr_)); | ||
| CHECK_HIP_ERROR(hipFree(committedWptr_)); | ||
| CHECK_HSAKMT_SUCCESS(hsaKmtUnmapMemoryToGPU(queueBuffer_), "Failed"); | ||
| CHECK_HSAKMT_SUCCESS(hsaKmtFreeMemory(queueBuffer_, SDMA_QUEUE_SIZE), "Failed"); | ||
| } catch (...) { | ||
| std::cerr << "Exception in ~SdmaQueue()" << std::endl; | ||
| std::exit(EXIT_FAILURE); | ||
| } |
There was a problem hiding this comment.
since both CHECK_HSAKMT_SUCCESS and CHECK_HIP_ERROR call exit() directly rather than throw, nothing in ~SdmaQueue() actually throws — so the try/catch(...) never gets hit. Was that intentional, or should the macros throw for it to be useful?
There was a problem hiding this comment.
huh, indeed, I was confused with CHECK_HSA_ERROR but these two macros never throw, so I have removed this block
Summary
Replaces the hard-coded static SDMA-engine assignment in the Anvil SDMA
transport with the engine recommendation that KFD already exposes via topology,
and uses it to spread connection channels across the recommended engine(s).
The previous static
mi300xOamMaplookup is kept as a fallback for platforms orkernels that don't report a recommendation.
Motivation
For a peer (xGMI) link, KFD publishes a per-link
recommended_sdma_engine_id_mask(sysfs), computed bykfd_set_recommended_sdma_engines()in the kernel. It is a load-balancingassignment that gives each ordered GPU pair a dedicated xGMI SDMA engine so that
concurrent peer pairs land on different engines.
Until now Anvil duplicated this mapping by hand (
mi300xOamMap, with engine id=
map[src][dst] * 2). Reading the mask directly from KFD:reshiftcase,non-8-GPU systems, partitioned/APU configs, future parts),
recommend more than one per link (e.g. MI350), while still collapsing to a
single engine where only one is recommended (e.g. 8x MI300X).
The decoded engine id from the mask is bit-for-bit equivalent to the old
mi300xOamMap[src][dst] * 2on MI300X, so behavior is unchanged on thatplatform while becoming correct-by-construction elsewhere.
Changes
getRecommendedEngineMask): querieshsaKmtGetNodeProperties/hsaKmtGetNodeIoLinkPropertiesfor the src node,finds the io_link to the dst node, and returns its
RecSdmaEngIdMask. Returns0 when no link/recommendation is available.
connect()spreads channels across engines: decodes the mask into a listof engine ids and round-robins the requested channels over them
(
engines[c % numEngines]).src == dsthas no xGMI io_link (so KFD reportsnothing); it now directly uses a general (non-xGMI) SDMA engine instead of
going through the lookup/fallback.
back to
getSdmaEngineId()(the existingmi300xOamMap).QueuePercentage = 100andHSA_QUEUE_PRIORITY_MAXIMUM, matching ROCr's ownamd_blit_sdmaSDMA copy queues.SdmaQueue::~SdmaQueue()is now a function-try-blockthat logs and exits on teardown failure instead of letting an exception escape
a destructor.
getNodeId()helper,SdmaQueueVectortype alias, removedstale commented-out debug prints.