Skip to content

obj: add standard-protocol GPU-direct S3-over-RDMA (accelerated=true)#1741

Draft
harshavardhana wants to merge 1 commit into
ai-dynamo:mainfrom
harshavardhana:feat/obj-s3-over-rdma-gds
Draft

obj: add standard-protocol GPU-direct S3-over-RDMA (accelerated=true)#1741
harshavardhana wants to merge 1 commit into
ai-dynamo:mainfrom
harshavardhana:feat/obj-s3-over-rdma-gds

Conversation

@harshavardhana

@harshavardhana harshavardhana commented Jun 7, 2026

Copy link
Copy Markdown

What

Adds a vendor-neutral S3-over-RDMA data path to the object plugin, selected via accelerated=true with no type (or type=s3). It implements the published x-amz-rdma-* protocol (the NVIDIA cuObject / aws-c-s3 convention) and needs no per-vendor code — any conformant endpoint works (MinIO AIStor today; AWS S3 if it adopts it). The control plane stays standard HTTP + SigV4 while the object payload moves out-of-band over RDMA (server RDMA_READ on PUT, RDMA_WRITE on GET; the HTTP body is empty).

This is the preferred GPU-direct path. accelerated=true, type=dell selects a vendor-specific engine for servers that speak Dell's x-rdma-info header.

Changes

  • s3/rdma_protocol.h — dependency-free wire helpers (token formatting, reply parsing), covered by a unit test.
  • s3/rdma.{h,cpp} — the cuObject RDMA token API plus a SigV4 UNSIGNED-PAYLOAD control plane that carries x-amz-rdma-token and reads x-amz-rdma-reply / x-amz-rdma-bytes-transferred.
  • s3_accel/generic/ — the standard-S3 engine, registered in the accel registry under type=s3; accelerated=true with no type resolves to it. RDMA is a capability of the standard S3 client, enabled for this engine.
  • Explicit opt-in — an RDMA decline/failure is a hard error; no automatic HTTP fallback.
  • DocsRDMA_PROTOCOL.md documents the server-side wire contract and a vendor compliance checklist; the README gains a GPU-Direct section.
  • nixl_object_test — gains -a/--accelerated (standard-S3); keeps -T/--type for vendor engines.

Verification

Validated end-to-end against MinIO AIStor over RoCEv2: accelerated=true (no type) and type=s3 both complete host-DRAM and GPU-direct VRAM PUT/GET with data verification. The protocol layer is covered by a unit test.

GPU-direct into VRAM requires the GPU and NIC to share a PCIe path that permits peer DMA (see the README's PCIe-ACS note) — a host/fabric property, independent of NIXL.

@copy-pr-bot

copy-pr-bot Bot commented Jun 7, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

👋 Hi harshavardhana! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@dpressle

dpressle commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

👀 Investigating Run Pre-Commit Hooks

@dpressle

dpressle commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🤖 CI Triage AgentRun Pre-Commit Hooks · commit 2aefadc1

The file is new (introduced in this PR). The log has given us everything we need. No further investigation required.


Summary: pre-commit codespell hook fails due to a misspelling in src/plugins/obj/s3/rdma_protocol.h — the word unparseable should be unparsable.

Root cause: The newly introduced file src/plugins/obj/s3/rdma_protocol.h (added on branch feat/obj-s3-over-rdma-gds) contains the misspelling unparseable at line 76. The codespell pre-commit hook (exit code 65) caught it and flagged it as a spelling error:

src/plugins/obj/s3/rdma_protocol.h:76: unparseable ==> unparsable

All other hooks (case conflicts, merge conflicts, end-of-files, trailing whitespace, line endings) passed cleanly. The Python linting hooks (mypy, isort, black, flake8) were skipped as there are no Python files in the changeset.

Implicated commit: [REDACTED:Hex High Entropy String] (PR #1741, branch feat/obj-s3-over-rdma-gds)

File: src/plugins/obj/s3/rdma_protocol.h:76

Suggested fix: In src/plugins/obj/s3/rdma_protocol.h at line 76, replace the misspelled word:

-  unparseable
+  unparsable

Then commit and push the fix. Alternatively, run pre-commit run codespell --files src/plugins/obj/s3/rdma_protocol.h locally to auto-verify the correction before pushing.

Related: None found — this is a straightforward typo in a new file introduced by PR #1741.

🛡️ This comment had 1 potential secret(s) redacted (Hex High Entropy String). See request_id 661b6714-f55b-4162-88dc-f530d2c5ba84 in the triage console for the audit trail.

@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from 2aefadc to 2fa4f8d Compare June 7, 2026 22:50
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a vendor-neutral S3-over-RDMA wire protocol and integrates an opt-in GPU-direct RDMA fast path (gds=true): protocol helpers and tests, cuObj-backed RDMA control plane, S3 client wiring to use RDMA-only transfers, engine VRAM/pinned-memory support, CLI/build/tests updates, and legacy-vendor deprecation notes.

Changes

GPU-Direct RDMA Feature

Layer / File(s) Summary
RDMA wire-protocol & tests
src/plugins/obj/s3/rdma_protocol.h, src/plugins/obj/RDMA_PROTOCOL.md, test/gtest/unit/obj/rdma_protocol.cpp
Defines x-amz-rdma-* constants, formatRdmaToken/parseRdmaReply, full protocol spec, and unit tests validating token and reply parsing/robustness.
Shared cuObj client & control plane
src/plugins/obj/s3/rdma.h, src/plugins/obj/s3/rdma.cpp
Implements SharedCuObjClient (buffer registration, token mint/return) and S3RdmaControlPlane (SigV4-signed, body-less PUT/GET with x-amz-rdma-*, parsing of replies/bytes, retry wrappers).
S3 client gds opt-in wiring
src/plugins/obj/s3/client.h, src/plugins/obj/s3/client.cpp, src/plugins/obj/obj_backend.h, src/utils/object/engine_utils.h
Adds gds opt-in detection, supportsRdma() capability, executor retention, constructor fail-fast when gds=true but RDMA unavailable, and RDMA-only async put/get paths (no HTTP fallback).
Engine memory advertisement & registration
src/plugins/obj/s3/engine_impl.h, src/plugins/obj/s3/engine_impl.cpp, src/plugins/obj/s3_crt/engine_impl.h
Advertises VRAM_SEG only when client supports RDMA; nixlObjMetadata supports pinned-memory; registerMem/deregisterMem handle RDMA registration/deregistration with DRAM best-effort and VRAM strictness.
CLI, build and tests wiring
src/plugins/obj/meson.build, test/gtest/unit/obj/meson.build, test/*, test/unit/plugins/object/nixl_object_test.cpp
Meson includes RDMA sources and tests; --gds CLI option added and printed; VRAM gating requires RDMA; backend params include gds=true when requested.
Docs and legacy accel positioning
src/plugins/obj/README.md, src/plugins/obj/RDMA_PROTOCOL.md, src/plugins/obj/s3_accel/TODO.md, src/plugins/obj/s3_accel/dell/README.md
Documents gds=true behavior and protocol, marks vendor engines as legacy, and provides a staged removal checklist for s3_accel/.

Sequence Diagram(s)

sequenceDiagram
  participant awsS3Client
  participant SharedCuObjClient
  participant S3RdmaControlPlane
  participant S3Server
  awsS3Client->>SharedCuObjClient: getToken(ptr,size,offset,op)
  SharedCuObjClient-->>awsS3Client: token
  awsS3Client->>S3RdmaControlPlane: rdmaPut/rdmaGet(ctx, token, buf_addr, size)
  S3RdmaControlPlane->>S3Server: signed HTTP PUT/GET (UNSIGNED-PAYLOAD) + x-amz-rdma-token
  S3Server-->>S3RdmaControlPlane: x-amz-rdma-reply (+etag / x-amz-rdma-bytes-transferred)
  S3RdmaControlPlane-->>awsS3Client: parsed result (etag/bytes)
  awsS3Client->>SharedCuObjClient: putToken(token)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ai-dynamo/nixl#1630: Unit-test and implementation work around RDMA transfer and memory registration paths that relate to the RDMA engine logic added here.

Suggested reviewers

  • w1ldptr
  • vvenkates27
  • brminich
  • ofer
  • ovidiusm

"🐇 I minted tokens in moonlit code,
I pinned the bytes where GPUs go,
No silent fallbacks, clear reply,
Fast paths hum and bytes fly high,
Hop, hop—RDMA makes data flow."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the main change: adding vendor-neutral, standard-protocol GPU-direct S3-over-RDMA support to the object plugin, selected via accelerated=true with no type parameter.
Description check ✅ Passed The description covers all required template sections (What/Why/How) with comprehensive detail about changes, design rationale, and verification approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/plugins/obj/s3_crt/engine_impl.h (1)

6-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use repository-standard header guard naming.

The guard should follow the NIXL_ + full repo-relative path mapping convention (e.g., NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H) instead of OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H.

As per coding guidelines, header guards must be NIXL_ + upper-snake-case path mapping with #ifndef/#define; and based on learnings, this repo enforces full repository-relative guard names.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3_crt/engine_impl.h` around lines 6 - 7, Replace the
non-conforming header guard OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the
repository-standard guard name by changing both the `#ifndef` and `#define` to
NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H (and update the trailing `#endif`
comment if present to match); ensure the new macro is used consistently in this
header (engine_impl.h) so the guard follows the NIXL_ + repo-relative path
upper-snake-case convention.

Sources: Coding guidelines, Learnings

src/plugins/obj/s3/engine_impl.cpp (1)

142-177: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Do not accept VRAM_SEG unless the client’s full RDMA fast path is ready.

This registration path only checks cuObject connectivity, so VRAM can be accepted as soon as SharedCuObjClient::instance() and registerBuffer() succeed. But awsS3Client still falls back to the plain HTTP path whenever rdmaCp_ is null or executor_ is missing. In that state a later PUT/GET can hand a GPU pointer to PreallocatedStreamBuf, which is not a valid CPU HTTP buffer.

The readiness predicate for VRAM needs to match the client’s actual transfer predicate, not just buffer-registration capability. Please gate VRAM registration/advertisement on a single shared “RDMA fast path ready” check that includes the control plane and executor as well.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3/engine_impl.cpp` around lines 142 - 177,
DefaultObjEngineImpl::registerMem currently accepts VRAM_SEG based only on
cuObject buffer registration; change it to require the client's full RDMA
fast-path readiness (control-plane and executor) before advertising/accepting
VRAM. Update registerMem to consult a single predicate (e.g.,
isRdmaFastPathReady() or an inline check of rdmaCp_ and executor_ alongside
gds_enabled_ and SharedCuObjClient::instance() / registerBuffer()) and only
allow VRAM_SEG when that predicate is true; otherwise return
NIXL_ERR_NOT_SUPPORTED. Ensure the predicate is used where
devIdToObjKey_/nixlObjMetadata for VRAM is created and where out is set for
registered buffers so the advertised capability matches awsS3Client’s actual
rdmaCp_/executor_ transfer readiness.
src/plugins/obj/s3/client.cpp (1)

36-78: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep gds=true distinguishable from “RDMA unavailable” and fail instead of falling through to HTTP.

Right now the code only enters the RDMA path when rdma_ && rdmaCp_ && executor_. If gds=true was requested but the fabric is absent, control-plane init fails, or no executor was provided, both async methods quietly run the normal HTTP path. That breaks the documented GDS contract: explicit RDMA opt-in is supposed to surface RDMA unavailability as an error, not silently downgrade.

Please persist a separate “GDS was requested” state and use it here to fail the transfer (or fail construction) whenever the fast path is unavailable, instead of treating that case as if RDMA had never been enabled.

Also applies to: 98-123, 152-170

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3/client.cpp` around lines 36 - 78, The constructor
awsS3Client currently treats enable_rdma_fast_path as a best-effort and silently
falls back to HTTP when RDMA is unavailable; instead persist the explicit
request by storing a member like gds_requested_ = enable_rdma_fast_path and use
it to fail early when RDMA cannot be used (rdma_ is null, rdmaCp_ is
null/invalid, or executor_ is missing). Modify the awsS3Client constructor (and
any analogous blocks at the other locations noted) to set gds_requested_ and, if
gds_requested_ is true but rdma_ == nullptr or rdmaCp_ is invalid or executor_
== nullptr, either throw/return an error from construction or set an error state
that makes async methods (which reference rdma_, rdmaCp_, executor_) fail fast
instead of falling back to HTTP; ensure checks reference the members rdma_,
rdmaCp_, executor_, and the new gds_requested_ flag so the GDS opt-in surfaces
unavailability as an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/plugins/obj/RDMA_PROTOCOL.md`:
- Around line 45-46: Update the `x-amz-rdma-reply` table row to remove the
statement that the client auto-falls back to HTTP and instead state that a `501`
(or absence of the header) indicates the server declined RDMA and clients will
treat the request as declined but will NOT automatically retry over HTTP; keep
the note that `x-amz-rdma-reply` is required on success and clarify which status
codes (200/204/206) indicate successful RDMA completion while 501 indicates
explicit decline so operators know to configure any non-automatic fallback
behavior themselves.
- Around line 51-53: The fenced code block containing the token format string
"x-amz-rdma-token: <descriptor>:<start_addr_hex>:<size_hex>" is missing a
language tag; update that block to include a language identifier (e.g., change
the opening fence to ```text) so the markdown linter recognizes the block type
and the snippet is annotated properly.

In `@src/plugins/obj/README.md`:
- Around line 301-304: Update the sentence that currently reads "negotiates with
proper fallback" in the README paragraph to remove or clarify that claim and
explicitly state the current behavior under gds=true (hard-error, no automatic
fallback). Locate the sentence in the same paragraph that begins "The generic
path above fixes all four:" and replace or append a clarifying note that under
gds=true the client does not perform automatic fallback and will fail hard,
keeping the rest of the claim about per-transfer capability and no per-vendor
code intact.

In `@src/plugins/obj/s3/rdma_protocol.h`:
- Line 76: Fix the spelling in the comment inside
src/plugins/obj/s3/rdma_protocol.h: replace the word "unparseable" with
"unparsable" in the comment line that reads "0  unparseable non-empty value:
treat as failure (-1) by the caller" so the codespell pre-commit hook passes;
update only the comment text (no code changes) near the rdma_protocol.h
enum/description.
- Around line 6-7: The header guard OBJ_PLUGIN_S3_RDMA_PROTOCOL_H in
src/plugins/obj/s3/rdma_protocol.h must be renamed to follow the project
convention: replace the existing `#ifndef/`#define guard with
NIXL_SRC_PLUGINS_OBJ_S3_RDMA_PROTOCOL_H and update the matching closing `#endif`
comment to reference NIXL_SRC_PLUGINS_OBJ_S3_RDMA_PROTOCOL_H so the guard uses
the NIXL_ prefix and the full repo-relative path in upper-snake-case.
- Around line 25-49: The constants in rdma_protocol.h use camelCase with a k
prefix; rename them to snake_case (keeping the k prefix) and update all call
sites—e.g., kAmzRdmaToken -> k_amz_rdma_token, kAmzRdmaReply ->
k_amz_rdma_reply, kAmzRdmaBytesTransferred -> k_amz_rdma_bytes_transferred,
kUnsignedPayload -> k_unsigned_payload, kRdmaReplySuccess ->
k_rdma_reply_success, kRdmaReplyNoContent -> k_rdma_reply_no_content,
kRdmaReplyPartialContent -> k_rdma_reply_partial_content,
kRdmaReplyNotImplemented -> k_rdma_reply_not_implemented, kRdmaNotSupported ->
k_rdma_not_supported, kRdmaMaxAttempts -> k_rdma_max_attempts,
kRdmaConnectTimeoutSecs -> k_rdma_connect_timeout_secs, and kRdmaTimeoutSecs ->
k_rdma_timeout_secs; update every reference in rdma.cpp, rdma.h, client.cpp and
all tests to use the new names to keep the code consistent with CodeStyle.md.

In `@src/plugins/obj/s3/rdma.cpp`:
- Around line 200-229: The control-plane retry logic mints a new token per
attempt but still uses the prebuilt impl_->http client, so rdmaPut/rdmaGet never
honor the NIC encoded in parseClientNicFromToken(), preventing failover to the
replacement HCA; update S3RdmaControlPlane::rdmaPut and
S3RdmaControlPlane::rdmaGet to extract the desired local NIC/address from the
token via parseClientNicFromToken(token) for each attempt and either (a)
recreate or reconfigure the HTTP transport/client bound to that local address
before calling impl_->http->MakeRequest, or (b) add a per-request binding option
to the HTTP call if the HTTP client supports it; ensure the NIC-aware client is
used for signing/MakeRequest for every retry attempt (look for usages of
impl_->http, formatRdmaToken, and parseClientNicFromToken in these methods) so
control-plane traffic actually egresses the chosen NIC.

In `@src/plugins/obj/s3/rdma.h`:
- Around line 6-7: The header guard macro in rdma.h currently uses
OBJ_PLUGIN_S3_RDMA_H; change it to the repo-relative convention
NIXL_SRC_PLUGINS_OBJ_S3_RDMA_H by replacing both occurrences in the `#ifndef` and
`#define` (and update any matching `#endif` comment if present) so the guard matches
the required NIXL_ + upper-snake-case path format.

In `@test/unit/plugins/object/nixl_object_test.cpp`:
- Around line 362-363: The continuation string literal after the std::cerr <<
statement is over-indented; adjust the second line of the split string so it is
indented four spaces past the base indentation of the std::cerr << statement
(i.e., match base indent + 4 spaces) rather than aligning under the opening
quote. Locate the std::cerr << usage in the
test/unit/plugins/object/nixl_object_test.cpp where the error message is split
across two lines and reduce the leading spaces on the continuation line (the
"--accelerated): VRAM_SEG is only supported on the RDMA path\n"; part) to follow
the 4-space continuation guideline.

---

Outside diff comments:
In `@src/plugins/obj/s3_crt/engine_impl.h`:
- Around line 6-7: Replace the non-conforming header guard
OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the repository-standard guard name by
changing both the `#ifndef` and `#define` to
NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H (and update the trailing `#endif`
comment if present to match); ensure the new macro is used consistently in this
header (engine_impl.h) so the guard follows the NIXL_ + repo-relative path
upper-snake-case convention.

In `@src/plugins/obj/s3/client.cpp`:
- Around line 36-78: The constructor awsS3Client currently treats
enable_rdma_fast_path as a best-effort and silently falls back to HTTP when RDMA
is unavailable; instead persist the explicit request by storing a member like
gds_requested_ = enable_rdma_fast_path and use it to fail early when RDMA cannot
be used (rdma_ is null, rdmaCp_ is null/invalid, or executor_ is missing).
Modify the awsS3Client constructor (and any analogous blocks at the other
locations noted) to set gds_requested_ and, if gds_requested_ is true but rdma_
== nullptr or rdmaCp_ is invalid or executor_ == nullptr, either throw/return an
error from construction or set an error state that makes async methods (which
reference rdma_, rdmaCp_, executor_) fail fast instead of falling back to HTTP;
ensure checks reference the members rdma_, rdmaCp_, executor_, and the new
gds_requested_ flag so the GDS opt-in surfaces unavailability as an error.

In `@src/plugins/obj/s3/engine_impl.cpp`:
- Around line 142-177: DefaultObjEngineImpl::registerMem currently accepts
VRAM_SEG based only on cuObject buffer registration; change it to require the
client's full RDMA fast-path readiness (control-plane and executor) before
advertising/accepting VRAM. Update registerMem to consult a single predicate
(e.g., isRdmaFastPathReady() or an inline check of rdmaCp_ and executor_
alongside gds_enabled_ and SharedCuObjClient::instance() / registerBuffer()) and
only allow VRAM_SEG when that predicate is true; otherwise return
NIXL_ERR_NOT_SUPPORTED. Ensure the predicate is used where
devIdToObjKey_/nixlObjMetadata for VRAM is created and where out is set for
registered buffers so the advertised capability matches awsS3Client’s actual
rdmaCp_/executor_ transfer readiness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8bd880f1-9d00-4b87-8b72-5679a5c39552

📥 Commits

Reviewing files that changed from the base of the PR and between 77d79d8 and 2aefadc.

📒 Files selected for processing (19)
  • src/plugins/obj/RDMA_PROTOCOL.md
  • src/plugins/obj/README.md
  • src/plugins/obj/meson.build
  • src/plugins/obj/obj_backend.cpp
  • src/plugins/obj/s3/client.cpp
  • src/plugins/obj/s3/client.h
  • src/plugins/obj/s3/engine_impl.cpp
  • src/plugins/obj/s3/engine_impl.h
  • src/plugins/obj/s3/rdma.cpp
  • src/plugins/obj/s3/rdma.h
  • src/plugins/obj/s3/rdma_protocol.h
  • src/plugins/obj/s3_accel/TODO.md
  • src/plugins/obj/s3_accel/client.cpp
  • src/plugins/obj/s3_accel/dell/README.md
  • src/plugins/obj/s3_crt/engine_impl.h
  • src/utils/object/engine_utils.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp
  • test/unit/plugins/object/nixl_object_test.cpp

Comment thread src/plugins/obj/RDMA_PROTOCOL.md Outdated
Comment thread src/plugins/obj/RDMA_PROTOCOL.md Outdated
Comment thread src/plugins/obj/README.md Outdated
Comment thread src/plugins/obj/s3/rdma_protocol.h Outdated
Comment thread src/plugins/obj/s3/rdma_protocol.h Outdated
Comment thread src/plugins/obj/s3/rdma_protocol.h Outdated
Comment thread src/plugins/obj/s3/rdma.cpp
Comment thread src/plugins/obj/s3/rdma.h Outdated
Comment thread test/unit/plugins/object/nixl_object_test.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
test/unit/plugins/object/nixl_object_test.cpp (1)

359-385: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject --gds together with --accelerated.

These flags are not independent anymore. This parser accepts both, but the stack contract for this PR says the legacy accelerated clients opt out of the generic RDMA fast path, so the CLI can claim GDS is enabled while actually driving the deprecated vendor path. Make the combination an error, or pick one mode explicitly before building params.

Proposed fix
+    if (use_gds && use_accelerated) {
+        std::cerr << "Error: --gds and --accelerated are mutually exclusive\n";
+        return 1;
+    }
+
     // Warn if --type is specified without --accelerated (type is ignored without it)
     if (!accel_type.empty() && !use_accelerated) {
         std::cerr << "Warning: --type has no effect without --accelerated\n";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unit/plugins/object/nixl_object_test.cpp` around lines 359 - 385, Reject
simultaneous use of the legacy accelerated and generic RDMA flags by checking
the boolean flags use_gds and use_accelerated early (e.g., before VRAM
validation and before building params) and returning an error if both are true;
update any error message to state that --gds and --accelerated are mutually
exclusive and ensure the code picks exactly one mode (or fails) before
constructing params so downstream logic (and accel_type handling) cannot proceed
with both set.
src/plugins/obj/s3/engine_impl.cpp (1)

275-295: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The timeout path can return while callbacks still hold stack references.

After Line 277 times out, this loop only force-completes the promise when it wins completed->exchange(true). If the callback already executed that exchange and then gets preempted before writing resp[i] / has_error or fulfilling the promise, queryMem() can return here and that late callback will still touch destroyed stack state. Please move the per-call result state onto shared heap storage captured by the callback, or add a post-timeout drain that guarantees those writes are finished before returning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3/engine_impl.cpp` around lines 275 - 295, The timeout
branch can return while callbacks still access stack state; fix by ensuring
per-call result storage lives on the heap and that callbacks capture shared
ownership, or by draining/waiting for callbacks after the timeout: move resp[i],
has_error and the per-call "states[i]" contents (completed, promise, and any
result containers) into shared heap-backed objects (e.g., shared_ptr<CallState>)
captured by the async callback so late callbacks write into heap-owned memory,
and/or after you set the fallback promise in the timeout path ensure you
join/drain the associated future (e.g., call futures[i].wait() or another
synchronization on the shared CallState) before returning from
queryMem()/checkObjectExistsAsync so no callback can touch destroyed stack
variables.
♻️ Duplicate comments (3)
src/plugins/obj/RDMA_PROTOCOL.md (2)

45-46: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve fallback behavior contradiction in the header table.

This row still says declined RDMA falls back to HTTP, which contradicts the “does not auto-fall-back” contract in this same document.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/RDMA_PROTOCOL.md` around lines 45 - 46, The table row for the
`x-amz-rdma-reply` header contradicts the doc's "does not auto-fall-back"
contract; update the `x-amz-rdma-reply` row to remove any language that says a
501 (or absent) result "falls back to HTTP" and instead state that
`200`/`204`/`206` indicate RDMA success while `501` (or absence) indicates the
peer declined RDMA and that clients MUST NOT auto-fall-back — the application or
client library decides fallback behavior. Ensure the updated wording is applied
to the `x-amz-rdma-reply` table cell and aligns with the rest of the document's
non-auto-fallback semantics.

51-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced token-format block.

Use a tagged fence (for example, text) to satisfy MD040.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/RDMA_PROTOCOL.md` around lines 51 - 53, The fenced code block
for the x-amz-rdma-token token format is missing a language tag which triggers
MD040; update the fenced block in RDMA_PROTOCOL.md (the block containing
"x-amz-rdma-token: <descriptor>:<start_addr_hex>:<size_hex>") to use a tagged
fence such as ```text so the block is recognized as plain text and MD040 is
satisfied.

Source: Linters/SAST tools

src/plugins/obj/README.md (1)

301-304: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the “negotiates with proper fallback” claim to match current behavior.

This still conflicts with the documented hard-error behavior under gds=true and can mislead operators.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/README.md` around lines 301 - 304, Update the sentence that
claims "negotiates with proper fallback" so it no longer asserts negotiation
behavior; remove that phrase (or reword the sentence) and add a clarifying note
that the client does not perform negotiation/fallback when gds=true and will
hard-error as documented under the gds=true behavior; target the paragraph
containing the phrase "negotiates with proper fallback" and the surrounding
statement about per-transfer capability to ensure the README aligns with actual
gds=true behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/plugins/obj/README.md`:
- Around line 271-279: The blockquote contains an unintended blank line that
violates MD028; remove the empty line between the two quoted paragraphs so the
blockquote lines are contiguous (i.e., keep the line starting "**Fail-safe
PUT:** ..." immediately followed by the line starting "**FUTURE — transparent
fallback.**" with no blank line between them) to satisfy the markdownlint
blockquote rule.

In `@src/plugins/obj/s3/client.h`:
- Around line 47-60: The protected constructor awsS3Client(nixl_b_params_t
*custom_params, std::shared_ptr<Aws::Utils::Threading::Executor> executor, bool
enable_rdma_fast_path) was placed before the public section, breaking the
required section order; move this protected constructor declaration so the class
layout is public then protected then private (i.e., relocate the
awsS3Client(...) declaration into the protected section that follows the
existing public members) and ensure no other public members remain after it to
restore the public->protected->private order mandated by docs/CodeStyle.md.
- Line 29: In src/plugins/obj/s3/client.h, move the `#include` "rdma.h" to remain
(it is safe for non-cuObject builds because rdma_protocol.h is standard-only)
and fix the class member ordering: ensure all public: members (the public API
methods declared later) appear first, then protected: (the internal
constructor/ctor currently placed between publics), then private: as needed;
specifically relocate the internal
ctor/SharedCuObjClient/S3RdmaControlPlane/RDMA helper-related protected
declarations out of the public section so the class follows public → protected →
private ordering (adjust declarations for functions/methods shown in client.h to
preserve signatures).

In `@src/plugins/obj/s3/engine_impl.h`:
- Line 69: Rename the new member gds_enabled_ to follow the project's
lowerCamelCase member convention: change it to gdsEnabled_ in
src/plugins/obj/s3/engine_impl.h and update all corresponding references/uses in
src/plugins/obj/s3/engine_impl.cpp (search for gds_enabled_ occurrences)
including any reads/writes, constructor initializers, and methods that reference
it so the symbol name is consistent throughout.

In `@src/plugins/obj/s3/rdma_protocol.h`:
- Around line 89-91: parseRdmaReply currently uses std::stoi(reply) which
accepts trailing junk (e.g., "200junk") and causes rdmaPut/rdmaGet to treat
malformed headers as success; change parseRdmaReply to use a strict integer
parse (e.g., std::from_chars) or validate that the whole string is a pure
integer and that parsing consumed all characters, returning an error sentinel if
not; update parseRdmaReply (and include <charconv> if using std::from_chars) so
it only returns the integer when parsing succeeds with no leftover characters,
preserving existing return semantics used by rdmaPut/rdmaGet which check
kRdmaReplySuccess/kRdmaReplyNoContent/kRdmaReplyPartialContent.

---

Outside diff comments:
In `@src/plugins/obj/s3/engine_impl.cpp`:
- Around line 275-295: The timeout branch can return while callbacks still
access stack state; fix by ensuring per-call result storage lives on the heap
and that callbacks capture shared ownership, or by draining/waiting for
callbacks after the timeout: move resp[i], has_error and the per-call
"states[i]" contents (completed, promise, and any result containers) into shared
heap-backed objects (e.g., shared_ptr<CallState>) captured by the async callback
so late callbacks write into heap-owned memory, and/or after you set the
fallback promise in the timeout path ensure you join/drain the associated future
(e.g., call futures[i].wait() or another synchronization on the shared
CallState) before returning from queryMem()/checkObjectExistsAsync so no
callback can touch destroyed stack variables.

In `@test/unit/plugins/object/nixl_object_test.cpp`:
- Around line 359-385: Reject simultaneous use of the legacy accelerated and
generic RDMA flags by checking the boolean flags use_gds and use_accelerated
early (e.g., before VRAM validation and before building params) and returning an
error if both are true; update any error message to state that --gds and
--accelerated are mutually exclusive and ensure the code picks exactly one mode
(or fails) before constructing params so downstream logic (and accel_type
handling) cannot proceed with both set.

---

Duplicate comments:
In `@src/plugins/obj/RDMA_PROTOCOL.md`:
- Around line 45-46: The table row for the `x-amz-rdma-reply` header contradicts
the doc's "does not auto-fall-back" contract; update the `x-amz-rdma-reply` row
to remove any language that says a 501 (or absent) result "falls back to HTTP"
and instead state that `200`/`204`/`206` indicate RDMA success while `501` (or
absence) indicates the peer declined RDMA and that clients MUST NOT
auto-fall-back — the application or client library decides fallback behavior.
Ensure the updated wording is applied to the `x-amz-rdma-reply` table cell and
aligns with the rest of the document's non-auto-fallback semantics.
- Around line 51-53: The fenced code block for the x-amz-rdma-token token format
is missing a language tag which triggers MD040; update the fenced block in
RDMA_PROTOCOL.md (the block containing "x-amz-rdma-token:
<descriptor>:<start_addr_hex>:<size_hex>") to use a tagged fence such as ```text
so the block is recognized as plain text and MD040 is satisfied.

In `@src/plugins/obj/README.md`:
- Around line 301-304: Update the sentence that claims "negotiates with proper
fallback" so it no longer asserts negotiation behavior; remove that phrase (or
reword the sentence) and add a clarifying note that the client does not perform
negotiation/fallback when gds=true and will hard-error as documented under the
gds=true behavior; target the paragraph containing the phrase "negotiates with
proper fallback" and the surrounding statement about per-transfer capability to
ensure the README aligns with actual gds=true behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: b38269c3-f931-4a43-9ce6-b10e1860fe78

📥 Commits

Reviewing files that changed from the base of the PR and between 2aefadc and 2fa4f8d.

📒 Files selected for processing (19)
  • src/plugins/obj/RDMA_PROTOCOL.md
  • src/plugins/obj/README.md
  • src/plugins/obj/meson.build
  • src/plugins/obj/obj_backend.cpp
  • src/plugins/obj/s3/client.cpp
  • src/plugins/obj/s3/client.h
  • src/plugins/obj/s3/engine_impl.cpp
  • src/plugins/obj/s3/engine_impl.h
  • src/plugins/obj/s3/rdma.cpp
  • src/plugins/obj/s3/rdma.h
  • src/plugins/obj/s3/rdma_protocol.h
  • src/plugins/obj/s3_accel/TODO.md
  • src/plugins/obj/s3_accel/client.cpp
  • src/plugins/obj/s3_accel/dell/README.md
  • src/plugins/obj/s3_crt/engine_impl.h
  • src/utils/object/engine_utils.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp
  • test/unit/plugins/object/nixl_object_test.cpp

Comment thread src/plugins/obj/README.md Outdated
Comment thread src/plugins/obj/s3/client.h
Comment thread src/plugins/obj/s3/client.h Outdated
Comment thread src/plugins/obj/s3/engine_impl.h Outdated
Comment thread src/plugins/obj/s3/rdma_protocol.h Outdated
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from 2fa4f8d to 3d6665c Compare June 7, 2026 23:09
@harshavardhana

This comment was marked as resolved.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/plugins/obj/s3_crt/engine_impl.h (1)

6-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Correct the header guard to include the full repository path.

The header guard OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H does not follow the project's convention. As per coding guidelines, header guards must be prefixed with NIXL_ and derived from the full repository-relative path in upper snake-case.

🔧 Proposed fix
-#ifndef OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H
-#define OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H
+#ifndef NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H
+#define NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H

And update the closing #endif comment at line 35:

-#endif // OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H
+#endif // NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3_crt/engine_impl.h` around lines 6 - 7, Replace the
nonconforming header guard OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the
project-prefixed, repo-path-based macro
NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H in the file
src/plugins/obj/s3_crt/engine_impl.h (update both the `#ifndef/`#define and the
matching `#endif` comment), ensuring the macro is upper snake-case derived from
the repository-relative path.

Source: Coding guidelines

src/plugins/obj/s3/engine_impl.h (1)

6-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Correct the header guard to include the full repository path.

The header guard OBJ_PLUGIN_S3_ENGINE_IMPL_H does not follow the project's convention. As per coding guidelines, header guards must be prefixed with NIXL_ and derived from the full repository-relative path in upper snake-case.

🔧 Proposed fix
-#ifndef OBJ_PLUGIN_S3_ENGINE_IMPL_H
-#define OBJ_PLUGIN_S3_ENGINE_IMPL_H
+#ifndef NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H
+#define NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H

And update the closing #endif comment at line 70:

-#endif // OBJ_PLUGIN_S3_ENGINE_IMPL_H
+#endif // NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3/engine_impl.h` around lines 6 - 7, The header guard
OBJ_PLUGIN_S3_ENGINE_IMPL_H must follow the project's convention: replace it
with a guard derived from the full repository-relative path in upper snake-case
and prefixed with NIXL_ (use NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H), update both
the opening `#ifndef/`#define and the corresponding closing `#endif` comment to
reference NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H so the guard and comment match
exactly; ensure no other macros conflict with that name and keep the rest of
engine_impl.h unchanged.

Source: Coding guidelines

♻️ Duplicate comments (1)
src/plugins/obj/s3/rdma_protocol.h (1)

85-96: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The partial-parse issue in parseRdmaReply remains unaddressed.

std::stoi(reply) accepts numeric prefixes: "200junk" parses as 200, which rdmaPut/rdmaGet then treat as rdma_reply_success. A malformed x-amz-rdma-reply header could be accepted as a valid RDMA transfer completion.

🔒 Proposed fix
 inline int
 parseRdmaReply(const std::string &reply) {
     if (reply.empty() || reply == "501") {
         return static_cast<int>(rdma_not_supported);
     }
     try {
-        return std::stoi(reply);
+        size_t parsed = 0;
+        const int code = std::stoi(reply, &parsed);
+        if (parsed != reply.size()) {
+            return 0;
+        }
+        return code;
     }
     catch (const std::exception &) {
         return 0;
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/obj/s3/rdma_protocol.h` around lines 85 - 96, parseRdmaReply
currently allows partial numeric parses (e.g. "200junk") because std::stoi
accepts numeric prefixes; change the function to require the entire reply string
be a valid integer (no trailing characters) by parsing with a strict method
(e.g. std::from_chars or std::stoi with the idx parameter and verifying idx ==
reply.size()), return static_cast<int>(rdma_not_supported) for empty or "501",
and return 0 on any parse failure; reference parseRdmaReply and the
rdma_not_supported/rdma_reply_success handling so callers (rdmaPut/rdmaGet) only
accept fully-formed numeric replies.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/plugins/obj/s3/client.cpp`:
- Around line 36-46: Check for and reject construction of awsS3Client when gds
(enable_rdma_fast_path) is requested but no executor is provided: in the
awsS3Client constructor(s) (the ctor that sets executor_, gds_requested_ and the
other overload around the same area), if enable_rdma_fast_path is true and
executor is nullptr, immediately fail fast by throwing a std::invalid_argument
(or similar) with a clear message (e.g., "RDMA/gds requested but no executor
provided"), or assert, so rdmaReady() cannot later hard-fail; ensure the same
check is applied to the other constructor overload mentioned (lines 81-85) and
update any callers/tests accordingly.

In `@src/plugins/obj/s3/engine_impl.cpp`:
- Around line 84-88: Constructor nixlObjMetadata currently takes obj_key by
value causing an extra copy; change the parameter type to const std::string&
obj_key and update the constructor signature accordingly so the initializer list
can initialize objKey(obj_key) without the extra copy (leave nixl_mem_t
nixl_mem, uint64_t dev_id unchanged); ensure any call sites still match the new
const-ref signature.

In `@src/plugins/obj/s3/rdma.cpp`:
- Around line 343-346: The code currently returns immediately when
rdma.getToken(buf, size, 0, CUOBJ_PUT) (and the analogous CUOBJ_GET call)
returns nullptr, bypassing the existing rdma_max_attempts retry logic; change
the flow so that a nullptr token is treated as a transient failure and retried
inside the same retry loop (use the same attempt counter/backoff logic used for
other transient errors) instead of returning -1 immediately, breaking only when
getToken() returns a non-null pointer or when rdma_max_attempts are exhausted,
and ensure any per-attempt cleanup/pause is preserved between retries.

In `@test/gtest/unit/obj/rdma_protocol.cpp`:
- Around line 30-38: Add a test asserting that partial numeric parsing with
trailing junk is treated as failure: inside the TEST named RdmaProtocol,
ParseReplyDeclinedAndAbsentAreFailSafe add an expectation like
EXPECT_EQ(parseRdmaReply("200junk"), 0) so parseRdmaReply is validated against
inputs where std::stoi could partially parse a number and must instead return
failure.

---

Outside diff comments:
In `@src/plugins/obj/s3_crt/engine_impl.h`:
- Around line 6-7: Replace the nonconforming header guard
OBJ_PLUGIN_S3_CRT_ENGINE_IMPL_H with the project-prefixed, repo-path-based macro
NIXL_SRC_PLUGINS_OBJ_S3_CRT_ENGINE_IMPL_H in the file
src/plugins/obj/s3_crt/engine_impl.h (update both the `#ifndef/`#define and the
matching `#endif` comment), ensuring the macro is upper snake-case derived from
the repository-relative path.

In `@src/plugins/obj/s3/engine_impl.h`:
- Around line 6-7: The header guard OBJ_PLUGIN_S3_ENGINE_IMPL_H must follow the
project's convention: replace it with a guard derived from the full
repository-relative path in upper snake-case and prefixed with NIXL_ (use
NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H), update both the opening `#ifndef/`#define
and the corresponding closing `#endif` comment to reference
NIXL_SRC_PLUGINS_OBJ_S3_ENGINE_IMPL_H so the guard and comment match exactly;
ensure no other macros conflict with that name and keep the rest of
engine_impl.h unchanged.

---

Duplicate comments:
In `@src/plugins/obj/s3/rdma_protocol.h`:
- Around line 85-96: parseRdmaReply currently allows partial numeric parses
(e.g. "200junk") because std::stoi accepts numeric prefixes; change the function
to require the entire reply string be a valid integer (no trailing characters)
by parsing with a strict method (e.g. std::from_chars or std::stoi with the idx
parameter and verifying idx == reply.size()), return
static_cast<int>(rdma_not_supported) for empty or "501", and return 0 on any
parse failure; reference parseRdmaReply and the
rdma_not_supported/rdma_reply_success handling so callers (rdmaPut/rdmaGet) only
accept fully-formed numeric replies.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 09f1fcbc-bd52-4cc2-9110-0249c4b8edb0

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa4f8d and 3d6665c.

📒 Files selected for processing (20)
  • src/plugins/obj/RDMA_PROTOCOL.md
  • src/plugins/obj/README.md
  • src/plugins/obj/meson.build
  • src/plugins/obj/obj_backend.cpp
  • src/plugins/obj/obj_backend.h
  • src/plugins/obj/s3/client.cpp
  • src/plugins/obj/s3/client.h
  • src/plugins/obj/s3/engine_impl.cpp
  • src/plugins/obj/s3/engine_impl.h
  • src/plugins/obj/s3/rdma.cpp
  • src/plugins/obj/s3/rdma.h
  • src/plugins/obj/s3/rdma_protocol.h
  • src/plugins/obj/s3_accel/TODO.md
  • src/plugins/obj/s3_accel/client.cpp
  • src/plugins/obj/s3_accel/dell/README.md
  • src/plugins/obj/s3_crt/engine_impl.h
  • src/utils/object/engine_utils.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp
  • test/unit/plugins/object/nixl_object_test.cpp

Comment thread src/plugins/obj/s3/client.cpp Outdated
Comment thread src/plugins/obj/s3/engine_impl.cpp Outdated
Comment thread src/plugins/obj/s3/rdma.cpp
Comment thread test/gtest/unit/obj/rdma_protocol.cpp Outdated
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from 3d6665c to e0cd444 Compare June 7, 2026 23:57
@harshavardhana

This comment was marked as outdated.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/plugins/obj/obj_backend.cpp`:
- Around line 35-46: The multi-line comment in src/plugins/obj/obj_backend.cpp
exceeds the 100-char line limit; reflow/wrap each sentence so no line is longer
than 100 characters while preserving the original wording and intent (the block
that mentions gds=true, s3/rdma.h, x-amz-rdma-*, objAccelEngineRegistry,
accelerated/type and vendor engines). Edit the comment block text (the header
block above the retained legacy-note) to insert line breaks at sensible word
boundaries, maintain existing comment markers and indentation, and remove any
trailing spaces so the C++ source conforms to the 100-char-per-line guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f77abb1e-e4c7-4493-ac98-98e394f73c3c

📥 Commits

Reviewing files that changed from the base of the PR and between 3d6665c and e0cd444.

📒 Files selected for processing (20)
  • src/plugins/obj/RDMA_PROTOCOL.md
  • src/plugins/obj/README.md
  • src/plugins/obj/meson.build
  • src/plugins/obj/obj_backend.cpp
  • src/plugins/obj/obj_backend.h
  • src/plugins/obj/s3/client.cpp
  • src/plugins/obj/s3/client.h
  • src/plugins/obj/s3/engine_impl.cpp
  • src/plugins/obj/s3/engine_impl.h
  • src/plugins/obj/s3/rdma.cpp
  • src/plugins/obj/s3/rdma.h
  • src/plugins/obj/s3/rdma_protocol.h
  • src/plugins/obj/s3_accel/TODO.md
  • src/plugins/obj/s3_accel/client.cpp
  • src/plugins/obj/s3_accel/dell/README.md
  • src/plugins/obj/s3_crt/engine_impl.h
  • src/utils/object/engine_utils.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp
  • test/unit/plugins/object/nixl_object_test.cpp

Comment thread src/plugins/obj/obj_backend.cpp Outdated
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from e0cd444 to 07af831 Compare June 8, 2026 02:19

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/plugins/obj/README.md`:
- Around line 235-245: The blockquote under the "Explicit opt-in; no automatic
fallback (yet)"/ "Why opt-in and not auto-detected?" section contains a blank
line breaking MD028; edit the blockquote so all lines are contiguous and each
line begins with "> " (i.e., remove the empty line between the quoted lines in
the paragraph that starts with "> **Why opt-in and not auto-detected?**" and
continues into the "GPUDirect topology note:" text), ensuring the entire quoted
block has no blank lines.

In `@src/plugins/obj/s3/engine_impl.cpp`:
- Line 172: The log statement using NIXL_ERROR in engine_impl.cpp is over 100
chars; split the long message string into two (or more) stream insertions so no
source line exceeds 100 chars — e.g., break the message "VRAM registration
requires gds=true and a ready RDMA fast path (cuObject)" into two shorter quoted
pieces passed to NIXL_ERROR (using multiple << operators) to keep each source
line under the limit while preserving the exact message when combined.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 14bd7be4-a7c0-4ddc-9194-c7f6139ba22e

📥 Commits

Reviewing files that changed from the base of the PR and between e0cd444 and 07af831.

📒 Files selected for processing (20)
  • src/plugins/obj/RDMA_PROTOCOL.md
  • src/plugins/obj/README.md
  • src/plugins/obj/meson.build
  • src/plugins/obj/obj_backend.cpp
  • src/plugins/obj/obj_backend.h
  • src/plugins/obj/s3/client.cpp
  • src/plugins/obj/s3/client.h
  • src/plugins/obj/s3/engine_impl.cpp
  • src/plugins/obj/s3/engine_impl.h
  • src/plugins/obj/s3/rdma.cpp
  • src/plugins/obj/s3/rdma.h
  • src/plugins/obj/s3/rdma_protocol.h
  • src/plugins/obj/s3_accel/TODO.md
  • src/plugins/obj/s3_accel/client.cpp
  • src/plugins/obj/s3_accel/dell/README.md
  • src/plugins/obj/s3_crt/engine_impl.h
  • src/utils/object/engine_utils.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp
  • test/unit/plugins/object/nixl_object_test.cpp

Comment thread src/plugins/obj/README.md
Comment thread src/plugins/obj/s3/engine_impl.cpp Outdated
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch 4 times, most recently from 310491b to 00921a3 Compare June 9, 2026 00:09
@harshavardhana harshavardhana changed the title obj: add generic GPU-direct S3-over-RDMA path (gds=true) obj: add standard-protocol GPU-direct S3-over-RDMA (accelerated=true) Jun 9, 2026
@harshavardhana

harshavardhana commented Jun 9, 2026

Copy link
Copy Markdown
Author

@aranadive @vvenkates27 — reworked per our discussion:

  • Dropped the separate gds flag. The standard-protocol S3-over-RDMA path is now selected by accelerated=true with no type (or type=s3); accelerated=true, type=dell selects the Dell vendor engine.
  • The standard-S3 engine uses the standard x-amz-rdma-* headers and needs no per-vendor code.

Validated end-to-end against MinIO AIStor over RoCEv2 — accelerated=true (no type) and type=s3, host DRAM and GPU-direct VRAM PUT/GET with data verification.

Ready for review — could you /ok to test + /build when you get a chance? Thanks!

@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from 00921a3 to af72422 Compare June 9, 2026 00:29
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch 3 times, most recently from 44877f3 to b09227d Compare June 10, 2026 19:52
Comment thread src/plugins/obj/s3/rdma.cpp Outdated
Comment thread src/plugins/obj/s3/rdma.cpp Outdated
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from b09227d to 337e024 Compare June 11, 2026 20:46
@harshavardhana

Copy link
Copy Markdown
Author

Thanks @ofer — your comments have been addressed: the -1 returns are now a named rdma_error constant, and the failure paths log reply_code before returning.

@jgoldsch12 jgoldsch12 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @harshavardhana Thanks for putting this PR together. Having a generic implementation which is cross vendor will be very helpful. I imagine other vendors may want to test with this to ensure that it is truly generic and not designed to work with only one specific endpoint. Given that there is no open wire protocol specification, implementors have had little to work with and thus chances of interoperability are low. Thanks for putting this stake in the ground. I'm interested in seeing this become part of NIXL.

Comment thread src/plugins/obj/RDMA_PROTOCOL.md
Comment thread src/plugins/obj/s3_accel/dell/README.md Outdated
Comment thread src/plugins/obj/s3_accel/generic/engine_impl.cpp
Comment thread src/plugins/obj/s3/engine_impl.h
Comment thread src/plugins/obj/s3/rdma.cpp
Aws::Http::HttpMethod::HTTP_PUT,
Aws::Utils::Stream::DefaultResponseStreamFactoryMethod);
req->SetHeaderValue("x-amz-content-sha256", unsigned_payload);
req->SetHeaderValue(amz_rdma_token, formatRdmaToken(token, buf_addr, size).c_str());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't passing the token, as cuobjserver expects. This is padding endpoint specific information to the descriptor string. These values should be moved to a different header for compatibility.

@harshavardhana harshavardhana Jun 12, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful read.

The header names and token layout aren't AIStor inventions — they come straight from NVIDIA's published cuObject material, and the wire protocol around them was written up as a spec: KiranModukuri/aws-c-s3 RDMA_PROTOCOL_SPEC.md. This is not yet GA'ed by AWS, however AWS refined this inline with what they could eventually implement — which is why it's a reasonable vendor-neutral base, and why pinning one concrete format down here (so other vendors can test against it) is useful.

  • Headers: x-amz-rdma-token / x-amz-rdma-reply / x-amz-rdma-bytes-transferred are taken directly from the cuObject docs (the client-side API + the RDMA-enabled GET/PUT workflow), and that spec defines x-amz-rdma-token as an opaque, provider-specific identifier that "encodes buffer location, size, … and context."
  • The :start_addr:size suffix is cuObject's own convention, not endpoint padding. cuObject's RDMA descriptor (RDMA_DESC_STR) is the fixed-length opaque handle — base address, buffer max-size, rkey, LID, DCTN, GID. cuObject then forms its IO descriptor by appending the per-operation transfer parameters to that descriptor: NVIDIA's own reference server (sock_server.cc) reads …<RDMA_DESC_STR><REM_BUF_START><LOCAL_BUF_OFFSET><REM_MSG_SIZE><LOCAL_FILE_OFFSET>, and cuMemObjGetRDMAToken(ptr, size, buffer_offset, op, …) takes the size + offset for exactly this. The token here carries the two essential fields (start addr + size).
  • The server splits off the fixed-length descriptor and forwards only those bytes to its RDMA provider — cuObjServer never sees the suffix — and consumes addr/size itself to set up the transfer. Since the HTTP body is empty (Content-Length: 0), the token is the only place to convey the request's start offset and size; moving them to a separate header would diverge from the proposed spec and from cuObject's own IO-descriptor layout.

Keen to keep refining as the spec firms up and other vendors test against it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed response. Two things of notes: 1. given we are talking about a document in private fork of a repo, it is further reason we should not refer to this as "agreed upon" or "preferred" or "standard". I appreciate the changes you made to the comments and text. 2. The x-amz-rdma-* header names are well understood from the documentation and from the proposed wire protocol spec from Kiran. However, neither specify the format of the descriptor. The implementation of aws-c-s3 fork does, however, pass the V1 descriptor produced by the cuobject library in x-amz-rdma-token header with only the 7 colon delimited fields, not 9. I've found no reference to the additional two fields. And while you make a point about the interface accepting additional fields, it does not change the token produced by the interface. Adding the two fields is a reasonable implementation decision for an endpoint developer, but I'm missing where the actual header value is documented to include these extra parameters. This is why I refer to this as being an endpoint specific implementation decision, and not an interoperable use of the proposed wire protocol. It is very possible I'm missing something here. I do know that ObjectScale interoperates with the aws-c-s3 fork, so this I why I'm raising this as a concern and a potential issue for NIXL interoperability.

@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch 2 times, most recently from 599de2c to c63d72d Compare June 13, 2026 08:40
Add a vendor-neutral S3-over-RDMA data path to the object plugin, selected via
the accelerated=true backend param with no type (or type=s3). It implements the
published x-amz-rdma-* protocol (the NVIDIA cuObject / aws-c-s3 convention) and
needs no per-vendor code; MinIO AIStor implements it today, and AWS S3 would
work transparently if it adopts it.

- New s3/rdma_protocol.h: dependency-free wire helpers (token formatting, reply
  parsing), covered by a unit test.
- New s3/rdma.{h,cpp}: the cuObject RDMA token API plus a SigV4 UNSIGNED-PAYLOAD
  control plane that carries x-amz-rdma-token and reads x-amz-rdma-reply /
  x-amz-rdma-bytes-transferred. The object payload moves out-of-band over RDMA
  (server RDMA_READ on PUT, RDMA_WRITE on GET); the HTTP body is empty.
- New s3_accel/generic/: the standard-S3 engine, registered in the accel
  registry under type "s3". accelerated=true with no type resolves to it; RDMA
  is a capability of the standard S3 client, enabled for this engine. It is the
  preferred GPU-direct path.
- Explicit opt-in: an RDMA decline/failure is a hard error, with no automatic
  HTTP fallback.
- RDMA_PROTOCOL.md documents the server-side wire contract and a vendor
  compliance checklist; the README gains a GPU-Direct section.
- nixl_object_test gains -a/--accelerated (standard-S3) and keeps -T/--type for
  vendor engines.

The Dell engine (accelerated=true, type=dell) is a vendor-specific engine for
servers that speak Dell's x-rdma-info header.

Validated end-to-end against MinIO AIStor over RoCEv2: host DRAM and GPU-direct
VRAM PUT/GET with data verification.

Signed-off-by: Harshavardhana <harsha@minio.io>
@harshavardhana harshavardhana force-pushed the feat/obj-s3-over-rdma-gds branch from c63d72d to 96b0846 Compare June 23, 2026 23:07
@harshavardhana

Copy link
Copy Markdown
Author

Rebased onto latest main (no conflicts) and all review comments are addressed; the lightweight CI checks are green. @aranadive @vvenkates27 — could you kick off /ok to test + /build and take a review pass when you get a chance? Thanks!

@aranadive

Copy link
Copy Markdown
Contributor

@harshavardhana thanks for submitting the PR. We had further discussions internally on this and a also few other S3 PRs for alignment. We think that splitting this into two PRs would be beneficial for review and making progress for other PRs. It also looks like there is a lot of http header / cuobject-related code that should not included directly in the plugin and it would be helpful to have a refactor of that. Here is a more detailed review / recommended split for this PR:

Architectural Feedback

RDMA awareness is wired into the base layer (awsS3Client, DefaultObjEngineImpl, iS3Client). Per the PR 1247 layering, it should live entirely in the accel layer (awsS3AccelClient, S3AccelObjEngineImpl). Specifically:

  • awsS3Client should not have RDMA members, constructors, or dispatch branches — it's the plain HTTP client.
  • iS3Client should not have supportsRdma() — that's an accelerator concern on the universal interface.
  • DefaultObjEngineImpl should not have #ifdef HAVE_CUOBJ_CLIENT guards or dynamic VRAM_SEG — it forces S3CrtObjEngineImpl to add a defensive override.
  • GenericObjEngineImpl is unnecessaryS3AccelObjEngineImpl is the generic standard-protocol engine and should register under "s3" directly.
  • ~370 lines of reusable infrastructure (SharedCuObjClient, S3RdmaControlPlane, signV4) are embedded in the S3 plugin but have no plugin-specific dependency. These should be shared utilities so vendor engines can also compose them.

Minor Issues

  1. formatRdmaToken uses %lx instead of PRIx64 — not portable.
  2. S3RdmaControlPlane resolves credentials once at construction — will fail for long-lived backends with rotating IAM session tokens. Document as known limitation.
  3. rdmaGet with size==0 would register a 0-byte RDMA region. Add a defensive guard.

Recommended Split

PR A: RDMA Transport and AWS Utilities (land first, no plugin changes)

Extract reusable infrastructure into shared utilities. These become the building blocks that both
S3AccelObjEngineImpl (the generic standard-protocol engine) and vendor engines (e.g. Dell) can
compose to build their RDMA implementations.

  1. Move SharedCuObjClient (~72 lines) to src/utils/rdma/ or src/utils/cuobj/. Process-wide RDMA plumbing with no S3 dependency.
  2. Move S3RdmaControlPlane (~247 lines — SigV4, URI construction, HTTP dispatch, rdmaPut, rdmaGet) to src/utils/object/ or src/utils/aws/.
  3. Simplify signV4 (~88 lines) or document why the manual implementation is required.
  4. Fix formatRdmaToken to use PRIx64.
  5. Move rdma_protocol.h (~109 lines) to src/utils/object/.
  6. Move retry wrappers (rdmaPutWithRetry/rdmaGetWithRetry, ~51 lines) alongside the control plane.
  7. Include rdma_protocol.cpp unit tests (~80 lines).
  8. Document credential expiry as a known limitation.

PR B: Wire RDMA into the Accel Layer (depends on PR A)

Integrate the utilities through awsS3AccelClient / S3AccelObjEngineImpl with zero changes to
base classes. S3AccelObjEngineImpl is the generic standard-protocol engine — there is no need
for a separate GenericObjEngineImpl.

  1. Move RDMA ownership into awsS3AccelClient (s3_accel/client.*). Construct SharedCuObjClient and S3RdmaControlPlane, override putObjectAsync/getObjectAsync for RDMA. Dell overrides header format only. No new client subclass needed.
  2. Move RDMA buffer pin/unpin into S3AccelObjEngineImpl (s3_accel/engine_impl.*). Override registerMem/deregisterMem. Advertise VRAM_SEG in getSupportedMems().
  3. Add VRAM_SEG validation in S3AccelObjEngineImpl::prepXfer, not in isValidPrepXferParams.
  4. Remove GenericObjEngineImpl. Register S3AccelObjEngineImpl under "s3" directly. obj_backend.cpp normalizes missing type to "s3".
  5. Shouldnt be a need to modify awsS3Client, DefaultObjEngineImpl, iS3Client, or S3CrtObjEngineImpl.
  6. Update documentation and tests to reflect accel-layer placement.

@harshavardhana

Copy link
Copy Markdown
Author

Converting to draft — splitting this per the review. The shared S3-over-RDMA utilities are in #1840; the accel-layer wiring will follow here on top of it once #1840 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants