obj: add standard-protocol GPU-direct S3-over-RDMA (accelerated=true)#1741
obj: add standard-protocol GPU-direct S3-over-RDMA (accelerated=true)#1741harshavardhana wants to merge 1 commit into
Conversation
|
👋 Hi harshavardhana! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
|
👀 Investigating |
|
🤖 CI Triage Agent — The file is new (introduced in this PR). The log has given us everything we need. No further investigation required. Summary: Root cause: The newly introduced file All other hooks (case conflicts, merge conflicts, end-of-files, trailing whitespace, line endings) passed cleanly. The Python linting hooks (mypy, isort, black, flake8) were skipped as there are no Python files in the changeset. Implicated commit: File: Suggested fix: In - unparseable
+ unparsableThen commit and push the fix. Alternatively, run Related: None found — this is a straightforward typo in a new file introduced by PR #1741.
|
2aefadc to
2fa4f8d
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a vendor-neutral S3-over-RDMA wire protocol and integrates an opt-in GPU-direct RDMA fast path ( ChangesGPU-Direct RDMA Feature
Sequence Diagram(s)sequenceDiagram
participant awsS3Client
participant SharedCuObjClient
participant S3RdmaControlPlane
participant S3Server
awsS3Client->>SharedCuObjClient: getToken(ptr,size,offset,op)
SharedCuObjClient-->>awsS3Client: token
awsS3Client->>S3RdmaControlPlane: rdmaPut/rdmaGet(ctx, token, buf_addr, size)
S3RdmaControlPlane->>S3Server: signed HTTP PUT/GET (UNSIGNED-PAYLOAD) + x-amz-rdma-token
S3Server-->>S3RdmaControlPlane: x-amz-rdma-reply (+etag / x-amz-rdma-bytes-transferred)
S3RdmaControlPlane-->>awsS3Client: parsed result (etag/bytes)
awsS3Client->>SharedCuObjClient: putToken(token)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/plugins/obj/s3_crt/engine_impl.h (1)
6-7:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse repository-standard header guard naming.
The guard should follow the
NIXL_+ full repo-relative path mapping convention (e.g.,NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H) instead ofOBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H.As per coding guidelines, header guards must be
NIXL_+ upper-snake-case path mapping with#ifndef/#define; and based on learnings, this repo enforces full repository-relative guard names.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3_crt/engine_impl.h` around lines 6 - 7, Replace the non-conforming header guard OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the repository-standard guard name by changing both the `#ifndef` and `#define` to NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H (and update the trailing `#endif` comment if present to match); ensure the new macro is used consistently in this header (engine_impl.h) so the guard follows the NIXL_ + repo-relative path upper-snake-case convention.Sources: Coding guidelines, Learnings
src/plugins/obj/s3/engine_impl.cpp (1)
142-177:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftDo not accept
VRAM_SEGunless the client’s full RDMA fast path is ready.This registration path only checks cuObject connectivity, so VRAM can be accepted as soon as
SharedCuObjClient::instance()andregisterBuffer()succeed. ButawsS3Clientstill falls back to the plain HTTP path wheneverrdmaCp_is null orexecutor_is missing. In that state a later PUT/GET can hand a GPU pointer toPreallocatedStreamBuf, which is not a valid CPU HTTP buffer.The readiness predicate for VRAM needs to match the client’s actual transfer predicate, not just buffer-registration capability. Please gate VRAM registration/advertisement on a single shared “RDMA fast path ready” check that includes the control plane and executor as well.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3/engine_impl.cpp` around lines 142 - 177, DefaultObjEngineImpl::registerMem currently accepts VRAM_SEG based only on cuObject buffer registration; change it to require the client's full RDMA fast-path readiness (control-plane and executor) before advertising/accepting VRAM. Update registerMem to consult a single predicate (e.g., isRdmaFastPathReady() or an inline check of rdmaCp_ and executor_ alongside gds_enabled_ and SharedCuObjClient::instance() / registerBuffer()) and only allow VRAM_SEG when that predicate is true; otherwise return NIXL_ERR_NOT_SUPPORTED. Ensure the predicate is used where devIdToObjKey_/nixlObjMetadata for VRAM is created and where out is set for registered buffers so the advertised capability matches awsS3Client’s actual rdmaCp_/executor_ transfer readiness.src/plugins/obj/s3/client.cpp (1)
36-78:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep
gds=truedistinguishable from “RDMA unavailable” and fail instead of falling through to HTTP.Right now the code only enters the RDMA path when
rdma_ && rdmaCp_ && executor_. Ifgds=truewas requested but the fabric is absent, control-plane init fails, or no executor was provided, both async methods quietly run the normal HTTP path. That breaks the documented GDS contract: explicit RDMA opt-in is supposed to surface RDMA unavailability as an error, not silently downgrade.Please persist a separate “GDS was requested” state and use it here to fail the transfer (or fail construction) whenever the fast path is unavailable, instead of treating that case as if RDMA had never been enabled.
Also applies to: 98-123, 152-170
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3/client.cpp` around lines 36 - 78, The constructor awsS3Client currently treats enable_rdma_fast_path as a best-effort and silently falls back to HTTP when RDMA is unavailable; instead persist the explicit request by storing a member like gds_requested_ = enable_rdma_fast_path and use it to fail early when RDMA cannot be used (rdma_ is null, rdmaCp_ is null/invalid, or executor_ is missing). Modify the awsS3Client constructor (and any analogous blocks at the other locations noted) to set gds_requested_ and, if gds_requested_ is true but rdma_ == nullptr or rdmaCp_ is invalid or executor_ == nullptr, either throw/return an error from construction or set an error state that makes async methods (which reference rdma_, rdmaCp_, executor_) fail fast instead of falling back to HTTP; ensure checks reference the members rdma_, rdmaCp_, executor_, and the new gds_requested_ flag so the GDS opt-in surfaces unavailability as an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/plugins/obj/RDMA_PROTOCOL.md`:
- Around line 45-46: Update the `x-amz-rdma-reply` table row to remove the
statement that the client auto-falls back to HTTP and instead state that a `501`
(or absence of the header) indicates the server declined RDMA and clients will
treat the request as declined but will NOT automatically retry over HTTP; keep
the note that `x-amz-rdma-reply` is required on success and clarify which status
codes (200/204/206) indicate successful RDMA completion while 501 indicates
explicit decline so operators know to configure any non-automatic fallback
behavior themselves.
- Around line 51-53: The fenced code block containing the token format string
"x-amz-rdma-token: <descriptor>:<start_addr_hex>:<size_hex>" is missing a
language tag; update that block to include a language identifier (e.g., change
the opening fence to ```text) so the markdown linter recognizes the block type
and the snippet is annotated properly.
In `@src/plugins/obj/README.md`:
- Around line 301-304: Update the sentence that currently reads "negotiates with
proper fallback" in the README paragraph to remove or clarify that claim and
explicitly state the current behavior under gds=true (hard-error, no automatic
fallback). Locate the sentence in the same paragraph that begins "The generic
path above fixes all four:" and replace or append a clarifying note that under
gds=true the client does not perform automatic fallback and will fail hard,
keeping the rest of the claim about per-transfer capability and no per-vendor
code intact.
In `@src/plugins/obj/s3/rdma_protocol.h`:
- Line 76: Fix the spelling in the comment inside
src/plugins/obj/s3/rdma_protocol.h: replace the word "unparseable" with
"unparsable" in the comment line that reads "0 unparseable non-empty value:
treat as failure (-1) by the caller" so the codespell pre-commit hook passes;
update only the comment text (no code changes) near the rdma_protocol.h
enum/description.
- Around line 6-7: The header guard OBJ_PLUGIN_S3_RDMA_PROTOCOL_H in
src/plugins/obj/s3/rdma_protocol.h must be renamed to follow the project
convention: replace the existing `#ifndef/`#define guard with
NIXL_SRC_PLUGINS_OBJ_S3_RDMA_PROTOCOL_H and update the matching closing `#endif`
comment to reference NIXL_SRC_PLUGINS_OBJ_S3_RDMA_PROTOCOL_H so the guard uses
the NIXL_ prefix and the full repo-relative path in upper-snake-case.
- Around line 25-49: The constants in rdma_protocol.h use camelCase with a k
prefix; rename them to snake_case (keeping the k prefix) and update all call
sites—e.g., kAmzRdmaToken -> k_amz_rdma_token, kAmzRdmaReply ->
k_amz_rdma_reply, kAmzRdmaBytesTransferred -> k_amz_rdma_bytes_transferred,
kUnsignedPayload -> k_unsigned_payload, kRdmaReplySuccess ->
k_rdma_reply_success, kRdmaReplyNoContent -> k_rdma_reply_no_content,
kRdmaReplyPartialContent -> k_rdma_reply_partial_content,
kRdmaReplyNotImplemented -> k_rdma_reply_not_implemented, kRdmaNotSupported ->
k_rdma_not_supported, kRdmaMaxAttempts -> k_rdma_max_attempts,
kRdmaConnectTimeoutSecs -> k_rdma_connect_timeout_secs, and kRdmaTimeoutSecs ->
k_rdma_timeout_secs; update every reference in rdma.cpp, rdma.h, client.cpp and
all tests to use the new names to keep the code consistent with CodeStyle.md.
In `@src/plugins/obj/s3/rdma.cpp`:
- Around line 200-229: The control-plane retry logic mints a new token per
attempt but still uses the prebuilt impl_->http client, so rdmaPut/rdmaGet never
honor the NIC encoded in parseClientNicFromToken(), preventing failover to the
replacement HCA; update S3RdmaControlPlane::rdmaPut and
S3RdmaControlPlane::rdmaGet to extract the desired local NIC/address from the
token via parseClientNicFromToken(token) for each attempt and either (a)
recreate or reconfigure the HTTP transport/client bound to that local address
before calling impl_->http->MakeRequest, or (b) add a per-request binding option
to the HTTP call if the HTTP client supports it; ensure the NIC-aware client is
used for signing/MakeRequest for every retry attempt (look for usages of
impl_->http, formatRdmaToken, and parseClientNicFromToken in these methods) so
control-plane traffic actually egresses the chosen NIC.
In `@src/plugins/obj/s3/rdma.h`:
- Around line 6-7: The header guard macro in rdma.h currently uses
OBJ_PLUGIN_S3_RDMA_H; change it to the repo-relative convention
NIXL_SRC_PLUGINS_OBJ_S3_RDMA_H by replacing both occurrences in the `#ifndef` and
`#define` (and update any matching `#endif` comment if present) so the guard matches
the required NIXL_ + upper-snake-case path format.
In `@test/unit/plugins/object/nixl_object_test.cpp`:
- Around line 362-363: The continuation string literal after the std::cerr <<
statement is over-indented; adjust the second line of the split string so it is
indented four spaces past the base indentation of the std::cerr << statement
(i.e., match base indent + 4 spaces) rather than aligning under the opening
quote. Locate the std::cerr << usage in the
test/unit/plugins/object/nixl_object_test.cpp where the error message is split
across two lines and reduce the leading spaces on the continuation line (the
"--accelerated): VRAM_SEG is only supported on the RDMA path\n"; part) to follow
the 4-space continuation guideline.
---
Outside diff comments:
In `@src/plugins/obj/s3_crt/engine_impl.h`:
- Around line 6-7: Replace the non-conforming header guard
OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the repository-standard guard name by
changing both the `#ifndef` and `#define` to
NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H (and update the trailing `#endif`
comment if present to match); ensure the new macro is used consistently in this
header (engine_impl.h) so the guard follows the NIXL_ + repo-relative path
upper-snake-case convention.
In `@src/plugins/obj/s3/client.cpp`:
- Around line 36-78: The constructor awsS3Client currently treats
enable_rdma_fast_path as a best-effort and silently falls back to HTTP when RDMA
is unavailable; instead persist the explicit request by storing a member like
gds_requested_ = enable_rdma_fast_path and use it to fail early when RDMA cannot
be used (rdma_ is null, rdmaCp_ is null/invalid, or executor_ is missing).
Modify the awsS3Client constructor (and any analogous blocks at the other
locations noted) to set gds_requested_ and, if gds_requested_ is true but rdma_
== nullptr or rdmaCp_ is invalid or executor_ == nullptr, either throw/return an
error from construction or set an error state that makes async methods (which
reference rdma_, rdmaCp_, executor_) fail fast instead of falling back to HTTP;
ensure checks reference the members rdma_, rdmaCp_, executor_, and the new
gds_requested_ flag so the GDS opt-in surfaces unavailability as an error.
In `@src/plugins/obj/s3/engine_impl.cpp`:
- Around line 142-177: DefaultObjEngineImpl::registerMem currently accepts
VRAM_SEG based only on cuObject buffer registration; change it to require the
client's full RDMA fast-path readiness (control-plane and executor) before
advertising/accepting VRAM. Update registerMem to consult a single predicate
(e.g., isRdmaFastPathReady() or an inline check of rdmaCp_ and executor_
alongside gds_enabled_ and SharedCuObjClient::instance() / registerBuffer()) and
only allow VRAM_SEG when that predicate is true; otherwise return
NIXL_ERR_NOT_SUPPORTED. Ensure the predicate is used where
devIdToObjKey_/nixlObjMetadata for VRAM is created and where out is set for
registered buffers so the advertised capability matches awsS3Client’s actual
rdmaCp_/executor_ transfer readiness.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 8bd880f1-9d00-4b87-8b72-5679a5c39552
📒 Files selected for processing (19)
src/plugins/obj/RDMA_PROTOCOL.mdsrc/plugins/obj/README.mdsrc/plugins/obj/meson.buildsrc/plugins/obj/obj_backend.cppsrc/plugins/obj/s3/client.cppsrc/plugins/obj/s3/client.hsrc/plugins/obj/s3/engine_impl.cppsrc/plugins/obj/s3/engine_impl.hsrc/plugins/obj/s3/rdma.cppsrc/plugins/obj/s3/rdma.hsrc/plugins/obj/s3/rdma_protocol.hsrc/plugins/obj/s3_accel/TODO.mdsrc/plugins/obj/s3_accel/client.cppsrc/plugins/obj/s3_accel/dell/README.mdsrc/plugins/obj/s3_crt/engine_impl.hsrc/utils/object/engine_utils.htest/gtest/unit/obj/meson.buildtest/gtest/unit/obj/rdma_protocol.cpptest/unit/plugins/object/nixl_object_test.cpp
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/unit/plugins/object/nixl_object_test.cpp (1)
359-385:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject
--gdstogether with--accelerated.These flags are not independent anymore. This parser accepts both, but the stack contract for this PR says the legacy accelerated clients opt out of the generic RDMA fast path, so the CLI can claim GDS is enabled while actually driving the deprecated vendor path. Make the combination an error, or pick one mode explicitly before building
params.Proposed fix
+ if (use_gds && use_accelerated) { + std::cerr << "Error: --gds and --accelerated are mutually exclusive\n"; + return 1; + } + // Warn if --type is specified without --accelerated (type is ignored without it) if (!accel_type.empty() && !use_accelerated) { std::cerr << "Warning: --type has no effect without --accelerated\n";🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/unit/plugins/object/nixl_object_test.cpp` around lines 359 - 385, Reject simultaneous use of the legacy accelerated and generic RDMA flags by checking the boolean flags use_gds and use_accelerated early (e.g., before VRAM validation and before building params) and returning an error if both are true; update any error message to state that --gds and --accelerated are mutually exclusive and ensure the code picks exactly one mode (or fails) before constructing params so downstream logic (and accel_type handling) cannot proceed with both set.src/plugins/obj/s3/engine_impl.cpp (1)
275-295:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe timeout path can return while callbacks still hold stack references.
After Line 277 times out, this loop only force-completes the promise when it wins
completed->exchange(true). If the callback already executed that exchange and then gets preempted before writingresp[i]/has_erroror fulfilling the promise,queryMem()can return here and that late callback will still touch destroyed stack state. Please move the per-call result state onto shared heap storage captured by the callback, or add a post-timeout drain that guarantees those writes are finished before returning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3/engine_impl.cpp` around lines 275 - 295, The timeout branch can return while callbacks still access stack state; fix by ensuring per-call result storage lives on the heap and that callbacks capture shared ownership, or by draining/waiting for callbacks after the timeout: move resp[i], has_error and the per-call "states[i]" contents (completed, promise, and any result containers) into shared heap-backed objects (e.g., shared_ptr<CallState>) captured by the async callback so late callbacks write into heap-owned memory, and/or after you set the fallback promise in the timeout path ensure you join/drain the associated future (e.g., call futures[i].wait() or another synchronization on the shared CallState) before returning from queryMem()/checkObjectExistsAsync so no callback can touch destroyed stack variables.
♻️ Duplicate comments (3)
src/plugins/obj/RDMA_PROTOCOL.md (2)
45-46:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve fallback behavior contradiction in the header table.
This row still says declined RDMA falls back to HTTP, which contradicts the “does not auto-fall-back” contract in this same document.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/RDMA_PROTOCOL.md` around lines 45 - 46, The table row for the `x-amz-rdma-reply` header contradicts the doc's "does not auto-fall-back" contract; update the `x-amz-rdma-reply` row to remove any language that says a 501 (or absent) result "falls back to HTTP" and instead state that `200`/`204`/`206` indicate RDMA success while `501` (or absence) indicates the peer declined RDMA and that clients MUST NOT auto-fall-back — the application or client library decides fallback behavior. Ensure the updated wording is applied to the `x-amz-rdma-reply` table cell and aligns with the rest of the document's non-auto-fallback semantics.
51-53:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a language tag to the fenced token-format block.
Use a tagged fence (for example,
text) to satisfy MD040.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/RDMA_PROTOCOL.md` around lines 51 - 53, The fenced code block for the x-amz-rdma-token token format is missing a language tag which triggers MD040; update the fenced block in RDMA_PROTOCOL.md (the block containing "x-amz-rdma-token: <descriptor>:<start_addr_hex>:<size_hex>") to use a tagged fence such as ```text so the block is recognized as plain text and MD040 is satisfied.Source: Linters/SAST tools
src/plugins/obj/README.md (1)
301-304:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the “negotiates with proper fallback” claim to match current behavior.
This still conflicts with the documented hard-error behavior under
gds=trueand can mislead operators.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/README.md` around lines 301 - 304, Update the sentence that claims "negotiates with proper fallback" so it no longer asserts negotiation behavior; remove that phrase (or reword the sentence) and add a clarifying note that the client does not perform negotiation/fallback when gds=true and will hard-error as documented under the gds=true behavior; target the paragraph containing the phrase "negotiates with proper fallback" and the surrounding statement about per-transfer capability to ensure the README aligns with actual gds=true behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/plugins/obj/README.md`:
- Around line 271-279: The blockquote contains an unintended blank line that
violates MD028; remove the empty line between the two quoted paragraphs so the
blockquote lines are contiguous (i.e., keep the line starting "**Fail-safe
PUT:** ..." immediately followed by the line starting "**FUTURE — transparent
fallback.**" with no blank line between them) to satisfy the markdownlint
blockquote rule.
In `@src/plugins/obj/s3/client.h`:
- Around line 47-60: The protected constructor awsS3Client(nixl_b_params_t
*custom_params, std::shared_ptr<Aws::Utils::Threading::Executor> executor, bool
enable_rdma_fast_path) was placed before the public section, breaking the
required section order; move this protected constructor declaration so the class
layout is public then protected then private (i.e., relocate the
awsS3Client(...) declaration into the protected section that follows the
existing public members) and ensure no other public members remain after it to
restore the public->protected->private order mandated by docs/CodeStyle.md.
- Line 29: In src/plugins/obj/s3/client.h, move the `#include` "rdma.h" to remain
(it is safe for non-cuObject builds because rdma_protocol.h is standard-only)
and fix the class member ordering: ensure all public: members (the public API
methods declared later) appear first, then protected: (the internal
constructor/ctor currently placed between publics), then private: as needed;
specifically relocate the internal
ctor/SharedCuObjClient/S3RdmaControlPlane/RDMA helper-related protected
declarations out of the public section so the class follows public → protected →
private ordering (adjust declarations for functions/methods shown in client.h to
preserve signatures).
In `@src/plugins/obj/s3/engine_impl.h`:
- Line 69: Rename the new member gds_enabled_ to follow the project's
lowerCamelCase member convention: change it to gdsEnabled_ in
src/plugins/obj/s3/engine_impl.h and update all corresponding references/uses in
src/plugins/obj/s3/engine_impl.cpp (search for gds_enabled_ occurrences)
including any reads/writes, constructor initializers, and methods that reference
it so the symbol name is consistent throughout.
In `@src/plugins/obj/s3/rdma_protocol.h`:
- Around line 89-91: parseRdmaReply currently uses std::stoi(reply) which
accepts trailing junk (e.g., "200junk") and causes rdmaPut/rdmaGet to treat
malformed headers as success; change parseRdmaReply to use a strict integer
parse (e.g., std::from_chars) or validate that the whole string is a pure
integer and that parsing consumed all characters, returning an error sentinel if
not; update parseRdmaReply (and include <charconv> if using std::from_chars) so
it only returns the integer when parsing succeeds with no leftover characters,
preserving existing return semantics used by rdmaPut/rdmaGet which check
kRdmaReplySuccess/kRdmaReplyNoContent/kRdmaReplyPartialContent.
---
Outside diff comments:
In `@src/plugins/obj/s3/engine_impl.cpp`:
- Around line 275-295: The timeout branch can return while callbacks still
access stack state; fix by ensuring per-call result storage lives on the heap
and that callbacks capture shared ownership, or by draining/waiting for
callbacks after the timeout: move resp[i], has_error and the per-call
"states[i]" contents (completed, promise, and any result containers) into shared
heap-backed objects (e.g., shared_ptr<CallState>) captured by the async callback
so late callbacks write into heap-owned memory, and/or after you set the
fallback promise in the timeout path ensure you join/drain the associated future
(e.g., call futures[i].wait() or another synchronization on the shared
CallState) before returning from queryMem()/checkObjectExistsAsync so no
callback can touch destroyed stack variables.
In `@test/unit/plugins/object/nixl_object_test.cpp`:
- Around line 359-385: Reject simultaneous use of the legacy accelerated and
generic RDMA flags by checking the boolean flags use_gds and use_accelerated
early (e.g., before VRAM validation and before building params) and returning an
error if both are true; update any error message to state that --gds and
--accelerated are mutually exclusive and ensure the code picks exactly one mode
(or fails) before constructing params so downstream logic (and accel_type
handling) cannot proceed with both set.
---
Duplicate comments:
In `@src/plugins/obj/RDMA_PROTOCOL.md`:
- Around line 45-46: The table row for the `x-amz-rdma-reply` header contradicts
the doc's "does not auto-fall-back" contract; update the `x-amz-rdma-reply` row
to remove any language that says a 501 (or absent) result "falls back to HTTP"
and instead state that `200`/`204`/`206` indicate RDMA success while `501` (or
absence) indicates the peer declined RDMA and that clients MUST NOT
auto-fall-back — the application or client library decides fallback behavior.
Ensure the updated wording is applied to the `x-amz-rdma-reply` table cell and
aligns with the rest of the document's non-auto-fallback semantics.
- Around line 51-53: The fenced code block for the x-amz-rdma-token token format
is missing a language tag which triggers MD040; update the fenced block in
RDMA_PROTOCOL.md (the block containing "x-amz-rdma-token:
<descriptor>:<start_addr_hex>:<size_hex>") to use a tagged fence such as ```text
so the block is recognized as plain text and MD040 is satisfied.
In `@src/plugins/obj/README.md`:
- Around line 301-304: Update the sentence that claims "negotiates with proper
fallback" so it no longer asserts negotiation behavior; remove that phrase (or
reword the sentence) and add a clarifying note that the client does not perform
negotiation/fallback when gds=true and will hard-error as documented under the
gds=true behavior; target the paragraph containing the phrase "negotiates with
proper fallback" and the surrounding statement about per-transfer capability to
ensure the README aligns with actual gds=true behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: b38269c3-f931-4a43-9ce6-b10e1860fe78
📒 Files selected for processing (19)
src/plugins/obj/RDMA_PROTOCOL.mdsrc/plugins/obj/README.mdsrc/plugins/obj/meson.buildsrc/plugins/obj/obj_backend.cppsrc/plugins/obj/s3/client.cppsrc/plugins/obj/s3/client.hsrc/plugins/obj/s3/engine_impl.cppsrc/plugins/obj/s3/engine_impl.hsrc/plugins/obj/s3/rdma.cppsrc/plugins/obj/s3/rdma.hsrc/plugins/obj/s3/rdma_protocol.hsrc/plugins/obj/s3_accel/TODO.mdsrc/plugins/obj/s3_accel/client.cppsrc/plugins/obj/s3_accel/dell/README.mdsrc/plugins/obj/s3_crt/engine_impl.hsrc/utils/object/engine_utils.htest/gtest/unit/obj/meson.buildtest/gtest/unit/obj/rdma_protocol.cpptest/unit/plugins/object/nixl_object_test.cpp
2fa4f8d to
3d6665c
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/plugins/obj/s3_crt/engine_impl.h (1)
6-7:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCorrect the header guard to include the full repository path.
The header guard
OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_Hdoes not follow the project's convention. As per coding guidelines, header guards must be prefixed withNIXL_and derived from the full repository-relative path in upper snake-case.🔧 Proposed fix
-#ifndef OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H -#define OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H +#ifndef NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H +#define NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_HAnd update the closing
#endifcomment at line 35:-#endif // OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H +#endif // NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3_crt/engine_impl.h` around lines 6 - 7, Replace the nonconforming header guard OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the project-prefixed, repo-path-based macro NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H in the file src/plugins/obj/s3_crt/engine_impl.h (update both the `#ifndef/`#define and the matching `#endif` comment), ensuring the macro is upper snake-case derived from the repository-relative path.Source: Coding guidelines
src/plugins/obj/s3/engine_impl.h (1)
6-7:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCorrect the header guard to include the full repository path.
The header guard
OBJ_PLUGIN_S3_ENGINE_IMPL_Hdoes not follow the project's convention. As per coding guidelines, header guards must be prefixed withNIXL_and derived from the full repository-relative path in upper snake-case.🔧 Proposed fix
-#ifndef OBJ_PLUGIN_S3_ENGINE_IMPL_H -#define OBJ_PLUGIN_S3_ENGINE_IMPL_H +#ifndef NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H +#define NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_HAnd update the closing
#endifcomment at line 70:-#endif // OBJ_PLUGIN_S3_ENGINE_IMPL_H +#endif // NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3/engine_impl.h` around lines 6 - 7, The header guard OBJ_PLUGIN_S3_ENGINE_IMPL_H must follow the project's convention: replace it with a guard derived from the full repository-relative path in upper snake-case and prefixed with NIXL_ (use NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H), update both the opening `#ifndef/`#define and the corresponding closing `#endif` comment to reference NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H so the guard and comment match exactly; ensure no other macros conflict with that name and keep the rest of engine_impl.h unchanged.Source: Coding guidelines
♻️ Duplicate comments (1)
src/plugins/obj/s3/rdma_protocol.h (1)
85-96:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe partial-parse issue in
parseRdmaReplyremains unaddressed.
std::stoi(reply)accepts numeric prefixes:"200junk"parses as200, whichrdmaPut/rdmaGetthen treat asrdma_reply_success. A malformedx-amz-rdma-replyheader could be accepted as a valid RDMA transfer completion.🔒 Proposed fix
inline int parseRdmaReply(const std::string &reply) { if (reply.empty() || reply == "501") { return static_cast<int>(rdma_not_supported); } try { - return std::stoi(reply); + size_t parsed = 0; + const int code = std::stoi(reply, &parsed); + if (parsed != reply.size()) { + return 0; + } + return code; } catch (const std::exception &) { return 0; } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/obj/s3/rdma_protocol.h` around lines 85 - 96, parseRdmaReply currently allows partial numeric parses (e.g. "200junk") because std::stoi accepts numeric prefixes; change the function to require the entire reply string be a valid integer (no trailing characters) by parsing with a strict method (e.g. std::from_chars or std::stoi with the idx parameter and verifying idx == reply.size()), return static_cast<int>(rdma_not_supported) for empty or "501", and return 0 on any parse failure; reference parseRdmaReply and the rdma_not_supported/rdma_reply_success handling so callers (rdmaPut/rdmaGet) only accept fully-formed numeric replies.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/plugins/obj/s3/client.cpp`:
- Around line 36-46: Check for and reject construction of awsS3Client when gds
(enable_rdma_fast_path) is requested but no executor is provided: in the
awsS3Client constructor(s) (the ctor that sets executor_, gds_requested_ and the
other overload around the same area), if enable_rdma_fast_path is true and
executor is nullptr, immediately fail fast by throwing a std::invalid_argument
(or similar) with a clear message (e.g., "RDMA/gds requested but no executor
provided"), or assert, so rdmaReady() cannot later hard-fail; ensure the same
check is applied to the other constructor overload mentioned (lines 81-85) and
update any callers/tests accordingly.
In `@src/plugins/obj/s3/engine_impl.cpp`:
- Around line 84-88: Constructor nixlObjMetadata currently takes obj_key by
value causing an extra copy; change the parameter type to const std::string&
obj_key and update the constructor signature accordingly so the initializer list
can initialize objKey(obj_key) without the extra copy (leave nixl_mem_t
nixl_mem, uint64_t dev_id unchanged); ensure any call sites still match the new
const-ref signature.
In `@src/plugins/obj/s3/rdma.cpp`:
- Around line 343-346: The code currently returns immediately when
rdma.getToken(buf, size, 0, CUOBJ_PUT) (and the analogous CUOBJ_GET call)
returns nullptr, bypassing the existing rdma_max_attempts retry logic; change
the flow so that a nullptr token is treated as a transient failure and retried
inside the same retry loop (use the same attempt counter/backoff logic used for
other transient errors) instead of returning -1 immediately, breaking only when
getToken() returns a non-null pointer or when rdma_max_attempts are exhausted,
and ensure any per-attempt cleanup/pause is preserved between retries.
In `@test/gtest/unit/obj/rdma_protocol.cpp`:
- Around line 30-38: Add a test asserting that partial numeric parsing with
trailing junk is treated as failure: inside the TEST named RdmaProtocol,
ParseReplyDeclinedAndAbsentAreFailSafe add an expectation like
EXPECT_EQ(parseRdmaReply("200junk"), 0) so parseRdmaReply is validated against
inputs where std::stoi could partially parse a number and must instead return
failure.
---
Outside diff comments:
In `@src/plugins/obj/s3_crt/engine_impl.h`:
- Around line 6-7: Replace the nonconforming header guard
OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the project-prefixed, repo-path-based macro
NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H in the file
src/plugins/obj/s3_crt/engine_impl.h (update both the `#ifndef/`#define and the
matching `#endif` comment), ensuring the macro is upper snake-case derived from
the repository-relative path.
In `@src/plugins/obj/s3/engine_impl.h`:
- Around line 6-7: The header guard OBJ_PLUGIN_S3_ENGINE_IMPL_H must follow the
project's convention: replace it with a guard derived from the full
repository-relative path in upper snake-case and prefixed with NIXL_ (use
NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H), update both the opening `#ifndef/`#define
and the corresponding closing `#endif` comment to reference
NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H so the guard and comment match exactly;
ensure no other macros conflict with that name and keep the rest of
engine_impl.h unchanged.
---
Duplicate comments:
In `@src/plugins/obj/s3/rdma_protocol.h`:
- Around line 85-96: parseRdmaReply currently allows partial numeric parses
(e.g. "200junk") because std::stoi accepts numeric prefixes; change the function
to require the entire reply string be a valid integer (no trailing characters)
by parsing with a strict method (e.g. std::from_chars or std::stoi with the idx
parameter and verifying idx == reply.size()), return
static_cast<int>(rdma_not_supported) for empty or "501", and return 0 on any
parse failure; reference parseRdmaReply and the
rdma_not_supported/rdma_reply_success handling so callers (rdmaPut/rdmaGet) only
accept fully-formed numeric replies.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 09f1fcbc-bd52-4cc2-9110-0249c4b8edb0
📒 Files selected for processing (20)
src/plugins/obj/RDMA_PROTOCOL.mdsrc/plugins/obj/README.mdsrc/plugins/obj/meson.buildsrc/plugins/obj/obj_backend.cppsrc/plugins/obj/obj_backend.hsrc/plugins/obj/s3/client.cppsrc/plugins/obj/s3/client.hsrc/plugins/obj/s3/engine_impl.cppsrc/plugins/obj/s3/engine_impl.hsrc/plugins/obj/s3/rdma.cppsrc/plugins/obj/s3/rdma.hsrc/plugins/obj/s3/rdma_protocol.hsrc/plugins/obj/s3_accel/TODO.mdsrc/plugins/obj/s3_accel/client.cppsrc/plugins/obj/s3_accel/dell/README.mdsrc/plugins/obj/s3_crt/engine_impl.hsrc/utils/object/engine_utils.htest/gtest/unit/obj/meson.buildtest/gtest/unit/obj/rdma_protocol.cpptest/unit/plugins/object/nixl_object_test.cpp
3d6665c to
e0cd444
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/plugins/obj/obj_backend.cpp`:
- Around line 35-46: The multi-line comment in src/plugins/obj/obj_backend.cpp
exceeds the 100-char line limit; reflow/wrap each sentence so no line is longer
than 100 characters while preserving the original wording and intent (the block
that mentions gds=true, s3/rdma.h, x-amz-rdma-*, objAccelEngineRegistry,
accelerated/type and vendor engines). Edit the comment block text (the header
block above the retained legacy-note) to insert line breaks at sensible word
boundaries, maintain existing comment markers and indentation, and remove any
trailing spaces so the C++ source conforms to the 100-char-per-line guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: f77abb1e-e4c7-4493-ac98-98e394f73c3c
📒 Files selected for processing (20)
src/plugins/obj/RDMA_PROTOCOL.mdsrc/plugins/obj/README.mdsrc/plugins/obj/meson.buildsrc/plugins/obj/obj_backend.cppsrc/plugins/obj/obj_backend.hsrc/plugins/obj/s3/client.cppsrc/plugins/obj/s3/client.hsrc/plugins/obj/s3/engine_impl.cppsrc/plugins/obj/s3/engine_impl.hsrc/plugins/obj/s3/rdma.cppsrc/plugins/obj/s3/rdma.hsrc/plugins/obj/s3/rdma_protocol.hsrc/plugins/obj/s3_accel/TODO.mdsrc/plugins/obj/s3_accel/client.cppsrc/plugins/obj/s3_accel/dell/README.mdsrc/plugins/obj/s3_crt/engine_impl.hsrc/utils/object/engine_utils.htest/gtest/unit/obj/meson.buildtest/gtest/unit/obj/rdma_protocol.cpptest/unit/plugins/object/nixl_object_test.cpp
e0cd444 to
07af831
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/plugins/obj/README.md`:
- Around line 235-245: The blockquote under the "Explicit opt-in; no automatic
fallback (yet)"/ "Why opt-in and not auto-detected?" section contains a blank
line breaking MD028; edit the blockquote so all lines are contiguous and each
line begins with "> " (i.e., remove the empty line between the quoted lines in
the paragraph that starts with "> **Why opt-in and not auto-detected?**" and
continues into the "GPUDirect topology note:" text), ensuring the entire quoted
block has no blank lines.
In `@src/plugins/obj/s3/engine_impl.cpp`:
- Line 172: The log statement using NIXL_ERROR in engine_impl.cpp is over 100
chars; split the long message string into two (or more) stream insertions so no
source line exceeds 100 chars — e.g., break the message "VRAM registration
requires gds=true and a ready RDMA fast path (cuObject)" into two shorter quoted
pieces passed to NIXL_ERROR (using multiple << operators) to keep each source
line under the limit while preserving the exact message when combined.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 14bd7be4-a7c0-4ddc-9194-c7f6139ba22e
📒 Files selected for processing (20)
src/plugins/obj/RDMA_PROTOCOL.mdsrc/plugins/obj/README.mdsrc/plugins/obj/meson.buildsrc/plugins/obj/obj_backend.cppsrc/plugins/obj/obj_backend.hsrc/plugins/obj/s3/client.cppsrc/plugins/obj/s3/client.hsrc/plugins/obj/s3/engine_impl.cppsrc/plugins/obj/s3/engine_impl.hsrc/plugins/obj/s3/rdma.cppsrc/plugins/obj/s3/rdma.hsrc/plugins/obj/s3/rdma_protocol.hsrc/plugins/obj/s3_accel/TODO.mdsrc/plugins/obj/s3_accel/client.cppsrc/plugins/obj/s3_accel/dell/README.mdsrc/plugins/obj/s3_crt/engine_impl.hsrc/utils/object/engine_utils.htest/gtest/unit/obj/meson.buildtest/gtest/unit/obj/rdma_protocol.cpptest/unit/plugins/object/nixl_object_test.cpp
310491b to
00921a3
Compare
|
@aranadive @vvenkates27 — reworked per our discussion:
Validated end-to-end against MinIO AIStor over RoCEv2 — Ready for review — could you |
00921a3 to
af72422
Compare
44877f3 to
b09227d
Compare
b09227d to
337e024
Compare
|
Thanks @ofer — your comments have been addressed: the |
jgoldsch12
left a comment
There was a problem hiding this comment.
Hi @harshavardhana Thanks for putting this PR together. Having a generic implementation which is cross vendor will be very helpful. I imagine other vendors may want to test with this to ensure that it is truly generic and not designed to work with only one specific endpoint. Given that there is no open wire protocol specification, implementors have had little to work with and thus chances of interoperability are low. Thanks for putting this stake in the ground. I'm interested in seeing this become part of NIXL.
| Aws::Http::HttpMethod::HTTP_PUT, | ||
| Aws::Utils::Stream::DefaultResponseStreamFactoryMethod); | ||
| req->SetHeaderValue("x-amz-content-sha256", unsigned_payload); | ||
| req->SetHeaderValue(amz_rdma_token, formatRdmaToken(token, buf_addr, size).c_str()); |
There was a problem hiding this comment.
This isn't passing the token, as cuobjserver expects. This is padding endpoint specific information to the descriptor string. These values should be moved to a different header for compatibility.
There was a problem hiding this comment.
Thanks for the careful read.
The header names and token layout aren't AIStor inventions — they come straight from NVIDIA's published cuObject material, and the wire protocol around them was written up as a spec: KiranModukuri/aws-c-s3 RDMA_PROTOCOL_SPEC.md. This is not yet GA'ed by AWS, however AWS refined this inline with what they could eventually implement — which is why it's a reasonable vendor-neutral base, and why pinning one concrete format down here (so other vendors can test against it) is useful.
- Headers:
x-amz-rdma-token/x-amz-rdma-reply/x-amz-rdma-bytes-transferredare taken directly from the cuObject docs (the client-side API + the RDMA-enabled GET/PUT workflow), and that spec definesx-amz-rdma-tokenas an opaque, provider-specific identifier that "encodes buffer location, size, … and context." - The
:start_addr:sizesuffix is cuObject's own convention, not endpoint padding. cuObject's RDMA descriptor (RDMA_DESC_STR) is the fixed-length opaque handle — base address, buffer max-size, rkey, LID, DCTN, GID. cuObject then forms its IO descriptor by appending the per-operation transfer parameters to that descriptor: NVIDIA's own reference server (sock_server.cc) reads…<RDMA_DESC_STR><REM_BUF_START><LOCAL_BUF_OFFSET><REM_MSG_SIZE><LOCAL_FILE_OFFSET>, andcuMemObjGetRDMAToken(ptr, size, buffer_offset, op, …)takes the size + offset for exactly this. The token here carries the two essential fields (start addr + size). - The server splits off the fixed-length descriptor and forwards only those bytes to its RDMA provider — cuObjServer never sees the suffix — and consumes addr/size itself to set up the transfer. Since the HTTP body is empty (
Content-Length: 0), the token is the only place to convey the request's start offset and size; moving them to a separate header would diverge from the proposed spec and from cuObject's own IO-descriptor layout.
Keen to keep refining as the spec firms up and other vendors test against it.
There was a problem hiding this comment.
Thank you for the detailed response. Two things of notes: 1. given we are talking about a document in private fork of a repo, it is further reason we should not refer to this as "agreed upon" or "preferred" or "standard". I appreciate the changes you made to the comments and text. 2. The x-amz-rdma-* header names are well understood from the documentation and from the proposed wire protocol spec from Kiran. However, neither specify the format of the descriptor. The implementation of aws-c-s3 fork does, however, pass the V1 descriptor produced by the cuobject library in x-amz-rdma-token header with only the 7 colon delimited fields, not 9. I've found no reference to the additional two fields. And while you make a point about the interface accepting additional fields, it does not change the token produced by the interface. Adding the two fields is a reasonable implementation decision for an endpoint developer, but I'm missing where the actual header value is documented to include these extra parameters. This is why I refer to this as being an endpoint specific implementation decision, and not an interoperable use of the proposed wire protocol. It is very possible I'm missing something here. I do know that ObjectScale interoperates with the aws-c-s3 fork, so this I why I'm raising this as a concern and a potential issue for NIXL interoperability.
599de2c to
c63d72d
Compare
Add a vendor-neutral S3-over-RDMA data path to the object plugin, selected via
the accelerated=true backend param with no type (or type=s3). It implements the
published x-amz-rdma-* protocol (the NVIDIA cuObject / aws-c-s3 convention) and
needs no per-vendor code; MinIO AIStor implements it today, and AWS S3 would
work transparently if it adopts it.
- New s3/rdma_protocol.h: dependency-free wire helpers (token formatting, reply
parsing), covered by a unit test.
- New s3/rdma.{h,cpp}: the cuObject RDMA token API plus a SigV4 UNSIGNED-PAYLOAD
control plane that carries x-amz-rdma-token and reads x-amz-rdma-reply /
x-amz-rdma-bytes-transferred. The object payload moves out-of-band over RDMA
(server RDMA_READ on PUT, RDMA_WRITE on GET); the HTTP body is empty.
- New s3_accel/generic/: the standard-S3 engine, registered in the accel
registry under type "s3". accelerated=true with no type resolves to it; RDMA
is a capability of the standard S3 client, enabled for this engine. It is the
preferred GPU-direct path.
- Explicit opt-in: an RDMA decline/failure is a hard error, with no automatic
HTTP fallback.
- RDMA_PROTOCOL.md documents the server-side wire contract and a vendor
compliance checklist; the README gains a GPU-Direct section.
- nixl_object_test gains -a/--accelerated (standard-S3) and keeps -T/--type for
vendor engines.
The Dell engine (accelerated=true, type=dell) is a vendor-specific engine for
servers that speak Dell's x-rdma-info header.
Validated end-to-end against MinIO AIStor over RoCEv2: host DRAM and GPU-direct
VRAM PUT/GET with data verification.
Signed-off-by: Harshavardhana <harsha@minio.io>
c63d72d to
96b0846
Compare
|
Rebased onto latest main (no conflicts) and all review comments are addressed; the lightweight CI checks are green. @aranadive @vvenkates27 — could you kick off |
|
@harshavardhana thanks for submitting the PR. We had further discussions internally on this and a also few other S3 PRs for alignment. We think that splitting this into two PRs would be beneficial for review and making progress for other PRs. It also looks like there is a lot of http header / cuobject-related code that should not included directly in the plugin and it would be helpful to have a refactor of that. Here is a more detailed review / recommended split for this PR: Architectural FeedbackRDMA awareness is wired into the base layer (
Minor Issues
Recommended SplitPR A: RDMA Transport and AWS Utilities (land first, no plugin changes)Extract reusable infrastructure into shared utilities. These become the building blocks that both
PR B: Wire RDMA into the Accel Layer (depends on PR A)Integrate the utilities through
|
What
Adds a vendor-neutral S3-over-RDMA data path to the object plugin, selected via
accelerated=truewith notype(ortype=s3). It implements the publishedx-amz-rdma-*protocol (the NVIDIA cuObject / aws-c-s3 convention) and needs no per-vendor code — any conformant endpoint works (MinIO AIStor today; AWS S3 if it adopts it). The control plane stays standard HTTP + SigV4 while the object payload moves out-of-band over RDMA (serverRDMA_READon PUT,RDMA_WRITEon GET; the HTTP body is empty).This is the preferred GPU-direct path.
accelerated=true, type=dellselects a vendor-specific engine for servers that speak Dell'sx-rdma-infoheader.Changes
s3/rdma_protocol.h— dependency-free wire helpers (token formatting, reply parsing), covered by a unit test.s3/rdma.{h,cpp}— the cuObject RDMA token API plus a SigV4UNSIGNED-PAYLOADcontrol plane that carriesx-amz-rdma-tokenand readsx-amz-rdma-reply/x-amz-rdma-bytes-transferred.s3_accel/generic/— the standard-S3 engine, registered in the accel registry undertype=s3;accelerated=truewith notyperesolves to it. RDMA is a capability of the standard S3 client, enabled for this engine.RDMA_PROTOCOL.mddocuments the server-side wire contract and a vendor compliance checklist; the README gains a GPU-Direct section.nixl_object_test— gains-a/--accelerated(standard-S3); keeps-T/--typefor vendor engines.Verification
Validated end-to-end against MinIO AIStor over RoCEv2:
accelerated=true(no type) andtype=s3both complete host-DRAM and GPU-direct VRAM PUT/GET with data verification. The protocol layer is covered by a unit test.