Skip to content

SDMA engine selection#429

Open
pemeliya wants to merge 6 commits into
mainfrom
pemeliya/sdma_queue_enum
Open

SDMA engine selection#429
pemeliya wants to merge 6 commits into
mainfrom
pemeliya/sdma_queue_enum

Conversation

@pemeliya

Copy link
Copy Markdown
Contributor

Summary

Replaces the hard-coded static SDMA-engine assignment in the Anvil SDMA
transport with the engine recommendation that KFD already exposes via topology,
and uses it to spread connection channels across the recommended engine(s).
The previous static mi300xOamMap lookup is kept as a fallback for platforms or
kernels that don't report a recommendation.

Motivation

For a peer (xGMI) link, KFD publishes a per-link
recommended_sdma_engine_id_mask (sysfs), computed by
kfd_set_recommended_sdma_engines() in the kernel. It is a load-balancing
assignment that gives each ordered GPU pair a dedicated xGMI SDMA engine so that
concurrent peer pairs land on different engines.

Until now Anvil duplicated this mapping by hand (mi300xOamMap, with engine id
= map[src][dst] * 2). Reading the mask directly from KFD:

  • removes a hand-maintained table that can silently drift from the kernel,
  • automatically adapts to topology/kernel differences (e.g. the reshift case,
    non-8-GPU systems, partitioned/APU configs, future parts),
  • lets the transport spread channels across multiple engines on platforms that
    recommend more than one per link (e.g. MI350), while still collapsing to a
    single engine where only one is recommended (e.g. 8x MI300X).

The decoded engine id from the mask is bit-for-bit equivalent to the old
mi300xOamMap[src][dst] * 2 on MI300X, so behavior is unchanged on that
platform while becoming correct-by-construction elsewhere.

Changes

  • KFD-driven engine selection (getRecommendedEngineMask): queries
    hsaKmtGetNodeProperties / hsaKmtGetNodeIoLinkProperties for the src node,
    finds the io_link to the dst node, and returns its RecSdmaEngIdMask. Returns
    0 when no link/recommendation is available.
  • connect() spreads channels across engines: decodes the mask into a list
    of engine ids and round-robins the requested channels over them
    (engines[c % numEngines]).
  • Self-copy short-circuit: src == dst has no xGMI io_link (so KFD reports
    nothing); it now directly uses a general (non-xGMI) SDMA engine instead of
    going through the lookup/fallback.
  • Static-map fallback retained: if KFD reports no mask for a peer pair, fall
    back to getSdmaEngineId() (the existing mi300xOamMap).
  • Queue creation parameters: SDMA queues are now created with
    QueuePercentage = 100 and HSA_QUEUE_PRIORITY_MAXIMUM, matching ROCr's own
    amd_blit_sdma SDMA copy queues.
  • Destructor hardening: SdmaQueue::~SdmaQueue() is now a function-try-block
    that logs and exits on teardown failure instead of letting an exception escape
    a destructor.
  • Minor cleanups: getNodeId() helper, SdmaQueueVector type alias, removed
    stale commented-out debug prints.

@pemeliya pemeliya requested review from Copilot, jhchouuu and wuyl1 and removed request for jhchouuu and wuyl1 June 26, 2026 00:21

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Anvil SDMA transport to select SDMA engines using KFD’s per-link recommended_sdma_engine_id_mask (via topology io_link properties), spreading channels across the recommended engine(s) and falling back to the existing MI300X static OAM map when no recommendation is available.

Changes:

  • Adds KFD-driven engine mask query (getRecommendedEngineMask) and uses it in connect() to round-robin channels across recommended engines.
  • Adjusts SDMA queue creation parameters to QueuePercentage=100 and HSA_QUEUE_PRIORITY_MAXIMUM.
  • Hardens SdmaQueue teardown by converting the destructor to a function-try-block that logs and terminates on teardown failure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/application/transport/sdma/anvil.cpp Implements KFD mask lookup, channel-to-engine spreading, updated queue creation params, and destructor try/catch teardown handling.
include/mori/application/transport/sdma/anvil.hpp Declares new helper APIs and introduces a SdmaQueueVector alias for the channel map value type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/application/transport/sdma/anvil.cpp
Comment thread src/application/transport/sdma/anvil.cpp Outdated
pemeliya and others added 4 commits June 25, 2026 17:24
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

@jhchouuu jhchouuu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for your work, the direction is right and this is good to merge. only some comments.

Comment on lines -190 to +188
CHECK_HSAKMT_SUCCESS(hsaKmtCreateQueueExt(localNodeId, HSA_QUEUE_SDMA_BY_ENG_ID,
DEFAULT_QUEUE_PERCENTAGE, DEFAULT_PRIORITY, engineId,
queueBuffer_, SDMA_QUEUE_SIZE, nullptr, &queue_),
"Failed");
CHECK_HSAKMT_SUCCESS(
hsaKmtCreateQueueExt(localNodeId, HSA_QUEUE_SDMA_BY_ENG_ID, 100, HSA_QUEUE_PRIORITY_MAXIMUM,
engineId, queueBuffer_, SDMA_QUEUE_SIZE, nullptr, &queue_),
"Failed");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the motivation for bumping the SDMA queue priority from NORMAL (0) to MAXIMUM (3) here? Just want to make sure we understand the impact on other queues (compute / other SDMA) sharing the GPU. Was this something you saw help in practice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this only affects SDMA queues: I took it from ROCR runtime which also uses max priority.

QueuePercentage = 100
This is the max share of the engine's scheduling quantum the queue may consume; 100 means "no artificial throttle." It's passed straight through to KFD. In practice it does not give you fine-grained SDMA bandwidth control — partial values just cap the queue's scheduling slice, and every real user (ROCr included) passes 100. So 100 is the right value; anything lower would only throttle yourself.

Priority = HSA_QUEUE_PRIORITY_MAXIMUM
The thunk remaps the enum (-3..3) into KFD's 0–15 range:

static uint32_t priority_map[] = {0, 3, 5, 7, 9, 11, 15};
So MAXIMUM (3) → KFD priority 15, vs NORMAL (0) → 7. The kernel hardware scheduler uses this only for arbitration ordering among queues contending for the same SDMA engine.

Two things worth keeping in mind:

Priority is relative. It only helps you win against lower-priority queues. Since ROCr's internal copy queues are also MAX, you won't out-prioritize them — you'll be on par (which is the intended behavior).

Comment on lines -222 to 227
SdmaQueue::~SdmaQueue() {
// TODO catch exception?
SdmaQueue::~SdmaQueue() try {
CHECK_HSAKMT_SUCCESS(hsaKmtDestroyQueue(queue_.QueueId), "Failed to destroy queue.");
CHECK_HIP_ERROR(hipFree(deviceHandle_));
CHECK_HIP_ERROR(hipFree(cachedWptr_));
CHECK_HIP_ERROR(hipFree(committedWptr_));
CHECK_HSAKMT_SUCCESS(hsaKmtUnmapMemoryToGPU(queueBuffer_), "Failed");
CHECK_HSAKMT_SUCCESS(hsaKmtFreeMemory(queueBuffer_, SDMA_QUEUE_SIZE), "Failed");
} catch (...) {
std::cerr << "Exception in ~SdmaQueue()" << std::endl;
std::exit(EXIT_FAILURE);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since both CHECK_HSAKMT_SUCCESS and CHECK_HIP_ERROR call exit() directly rather than throw, nothing in ~SdmaQueue() actually throws — so the try/catch(...) never gets hit. Was that intentional, or should the macros throw for it to be useful?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, indeed, I was confused with CHECK_HSA_ERROR but these two macros never throw, so I have removed this block

@jhchouuu

Copy link
Copy Markdown
Collaborator

cc @wuyl1 @zhangfei829

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants