Skip to content

utils: add shared S3-over-RDMA transport and protocol helpers#1840

Open
harshavardhana wants to merge 1 commit into
ai-dynamo:mainfrom
harshavardhana:feat/obj-rdma-utils
Open

utils: add shared S3-over-RDMA transport and protocol helpers#1840
harshavardhana wants to merge 1 commit into
ai-dynamo:mainfrom
harshavardhana:feat/obj-rdma-utils

Conversation

@harshavardhana

@harshavardhana harshavardhana commented Jun 26, 2026

Copy link
Copy Markdown

What

Adds the shared GPU-direct S3-over-RDMA infrastructure as reusable utilities under src/utils/object/, so object engines compose it rather than each embedding its own copy. This keeps the transport/protocol layer in one place and easy to maintain as more engines adopt it. Additive only — no existing code paths change.

Contents

  • rdma_protocol.h — dependency-free wire helpers (token formatting, reply parsing, constants), unit-tested.
  • rdma.{h,cpp}SharedCuObjClient (process-wide cuObject RDMA plumbing) and S3RdmaControlPlane (SigV4 UNSIGNED-PAYLOAD control plane carrying x-amz-rdma-token, reading x-amz-rdma-reply / x-amz-rdma-bytes-transferred) plus retry wrappers. Guarded by HAVE_CUOBJ_CLIENT; compiles to nothing without cuObjClient.
  • Built into the obj_utils library; RdmaProtocol unit test under test/gtest/unit/obj.

The header names and token layout follow NVIDIA's published cuObject docs and the protocol proposed to AWS (aws-c-s3 RDMA_PROTOCOL_SPEC.md).

Notes

  • formatRdmaToken uses PRIx64; rdmaGet rejects size==0; credentials are resolved once at construction (documented limitation: rotating IAM session tokens need a fresh backend).
  • SharedCuObjClient has no S3 dependency, so it can move under src/utils/cuobj/ if preferred — happy to do that.

Test

Builds into obj_utils; RdmaProtocol.* unit tests pass (7/7).

Types of changes

  • New feature (non-breaking — additive utilities)

Summary by CodeRabbit

  • New Features

    • Added accelerated S3 transfers over RDMA for compatible setups.
    • Enabled efficient multipart uploads and range-based reads for RDMA operations.
  • Bug Fixes

    • Improved handling of unsupported/declined RDMA requests and response parsing.
    • Refined retry behavior to recover from transient RDMA token/connectivity issues.
  • Tests

    • Added unit tests covering RDMA token formatting and RDMA response header parsing.

@harshavardhana harshavardhana requested a review from a team as a code owner June 26, 2026 17:08
@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown

👋 Hi harshavardhana! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds S3-over-RDMA protocol helpers, public declarations, implementation wiring, signed PUT/GET handling with token retries, build updates, and unit tests for wire-format behavior.

Changes

S3 RDMA object backend

Layer / File(s) Summary
Protocol contract and tests
src/utils/object/rdma_protocol.h, test/gtest/unit/obj/meson.build, test/gtest/unit/obj/rdma_protocol.cpp
Adds RDMA header constants, token formatting, reply parsing, and unit coverage for exact and malformed wire values.
Public RDMA API
src/utils/object/rdma.h
Declares the cuObjClient wrapper, request context, control-plane class, and retry helpers behind HAVE_CUOBJ_CLIENT.
Build wiring and shared client
src/utils/object/meson.build, src/utils/object/rdma.cpp
Switches obj_utils to shared source/dependency lists and implements the shared client singleton used for connectivity checks, buffer registration, and token minting.
Control-plane signing and setup
src/utils/object/rdma.cpp
Builds RDMA URIs, signs requests with AWS SigV4, and initializes the HTTP client, region, endpoint, and credentials state.
PUT/GET execution and retries
src/utils/object/rdma.cpp
Sends signed RDMA PUT and GET requests, interprets x-amz-rdma-reply and transfer metadata, and retries token minting around both paths.

Sequence Diagram(s)

sequenceDiagram
  participant rdmaPutWithRetry
  participant SharedCuObjClient
  participant S3RdmaControlPlane
  participant S3

  rdmaPutWithRetry->>SharedCuObjClient: getToken()
  SharedCuObjClient-->>rdmaPutWithRetry: token or null
  alt token available
    rdmaPutWithRetry->>S3RdmaControlPlane: rdmaPut(ctx, token, buf_addr, size)
    S3RdmaControlPlane->>S3: signed HTTP PUT with x-amz-rdma-token
    S3-->>S3RdmaControlPlane: HTTP 200 / x-amz-rdma-reply
    S3RdmaControlPlane-->>rdmaPutWithRetry: status, ETag, checksum
    rdmaPutWithRetry->>SharedCuObjClient: putToken(token)
  else token unavailable
    Note over rdmaPutWithRetry: retry up to rdma_max_attempts
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped through S3 with tokens in tow,
Signed little requests and watched them glow.
RDMA carrots zipped through the night,
And unit tests kept the wire path right.
Thump thump — the build went snip-snap-bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main additive change: shared S3-over-RDMA transport and protocol helpers.
Description check ✅ Passed The description covers What, contents, notes, and testing, and provides enough context even though an explicit Why/How structure is light.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/object/rdma_protocol.h`:
- Around line 90-105: The empty-reply handling in parseRdmaReply is too broad:
it currently treats a missing x-amz-rdma-reply the same as an explicit RDMA
decline, which causes rdmaPutWithRetry in rdma.cpp to skip transient retries on
non-200 PUT failures. Update parseRdmaReply in rdma_protocol.h so the
missing-header behavior is caller-controlled (or otherwise split GET vs PUT
semantics), and make rdmaGet use the decline-on-empty path while rdmaPut treats
an absent reply as malformed/error instead of rdma_not_supported.
- Around line 66-75: The RDMA token formatting in formatRdmaToken is truncating
opaque descriptors by writing into a fixed 512-byte stack buffer and ignoring
the snprintf result. Update formatRdmaToken in rdma_protocol.h to build the
token with a dynamically sized string or otherwise ensure the full descriptor is
preserved before it is used as x-amz-rdma-token, and handle the formatted length
instead of discarding snprintf’s return value.

In `@src/utils/object/rdma.cpp`:
- Line 309: The control plane validity check is too permissive because `valid_`
in `rdma.cpp` only requires `impl_->http` and a non-empty `access_key`. Update
the validity logic in the control-plane setup path so it also requires a
non-empty `secret_key`, matching what `signV4()` needs for request signing. Keep
the change localized around the `valid_` assignment and the related credential
fields on `impl_`.
- Around line 507-513: The retry loop in rdmaGet() is minting a cuObject token
before validating the request size, so zero-length GETs still reach
rdma.getToken() even though S3RdmaControlPlane::rdmaGet() rejects them; add an
early size == 0 check at the start of rdmaGet() (before the attempt loop) and
return the same error path immediately so no token minting or cp.rdmaGet() call
happens for empty requests.
- Around line 454-458: Reject out-of-range x-amz-rdma-bytes-transferred values
in the RDMA response parsing path so callers never accept more bytes than were
requested. Update the logic around the response-header handling in the rdma
transfer code to validate the parsed header against the requested transfer size
before returning it, and treat any negative or oversized value as rdma_error.
Keep the existing std::stoll parsing and error handling, but add the bound check
in the same block where amz_rdma_bytes_transferred is read.
- Around line 416-420: The byte-range construction in the RDMA request header
can overflow when computing the end offset as offset + size - 1, which may
produce an invalid range. Update the logic in the range-setting block to compute
and validate the end value safely before calling SetHeaderValue, using the
relevant offset and size variables in this code path. If the range cannot be
represented without overflow, handle it explicitly rather than emitting a
malformed bytes= header.

In `@src/utils/object/rdma.h`:
- Around line 113-116: Add Doxygen-style documentation blocks for the remaining
public header APIs in rdma.h, specifically the valid() accessor and
rdmaGetWithRetry() function, matching the surrounding public declarations. Place
the comments immediately above valid() and rdmaGetWithRetry() using the same /**
... */ style as the other documented symbols so the public API is consistently
documented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 91c32fec-931c-4672-bb22-1dd6134ddee9

📥 Commits

Reviewing files that changed from the base of the PR and between 32f3104 and 765e429.

📒 Files selected for processing (6)
  • src/utils/object/meson.build
  • src/utils/object/rdma.cpp
  • src/utils/object/rdma.h
  • src/utils/object/rdma_protocol.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp

Comment thread src/utils/object/rdma_protocol.h Outdated
Comment thread src/utils/object/rdma_protocol.h
Comment thread src/utils/object/rdma.cpp Outdated
Comment thread src/utils/object/rdma.cpp
Comment thread src/utils/object/rdma.cpp
Comment thread src/utils/object/rdma.cpp
Comment thread src/utils/object/rdma.h
Comment on lines +113 to +116
bool
valid() const {
return valid_;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Document the remaining public API entries.

valid() and rdmaGetWithRetry() are public header APIs, but unlike the surrounding declarations they have no Doxygen comment block.

Proposed fix
     bool
+    /** Return true when the control-plane client was initialized successfully. */
     valid() const {
         return valid_;
     }
@@
+/**
+ * Mint a token, run rdmaGet, release the token, with one transient retry
+ * (covering token-mint and control-plane hiccups). The buffer must already be
+ * registered via SharedCuObjClient::registerBuffer().
+ *
+ * `@return` >0 bytes transferred (success), rdma_not_supported (server declined),
+ *         or rdma_error (failure). The caller treats anything < 0 as an error.
+ */
 ssize_t
 rdmaGetWithRetry(SharedCuObjClient &rdma,
                  S3RdmaControlPlane &cp,
                  S3RdmaClientCtx &ctx,

As per path instructions, "Comments: use Doxygen-style /** ... */ for documenting public APIs/classes/functions".

Also applies to: 161-167

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/rdma.h` around lines 113 - 116, Add Doxygen-style
documentation blocks for the remaining public header APIs in rdma.h,
specifically the valid() accessor and rdmaGetWithRetry() function, matching the
surrounding public declarations. Place the comments immediately above valid()
and rdmaGetWithRetry() using the same /** ... */ style as the other documented
symbols so the public API is consistently documented.

Source: Path instructions

Extract the reusable GPU-direct S3-over-RDMA infrastructure into
src/utils/object/ so the object plugin's accel engines (and future vendor
engines) can compose it, rather than embedding it in the plugin. No plugin
wiring yet — that lands in a follow-up PR.

- rdma_protocol.h: dependency-free wire helpers (token formatting, reply
  parsing, constants), covered by a unit test.
- rdma.{h,cpp}: SharedCuObjClient (process-wide cuObject RDMA plumbing) and
  S3RdmaControlPlane (SigV4 UNSIGNED-PAYLOAD control plane carrying
  x-amz-rdma-token and reading x-amz-rdma-reply / x-amz-rdma-bytes-transferred),
  plus the retry wrappers. Guarded by HAVE_CUOBJ_CLIENT; compiles to nothing
  without cuObjClient.
- Built into the obj_utils library; protocol unit test in test/gtest/unit/obj.

The header names and token layout come from NVIDIA's published cuObject docs and
the protocol proposed to AWS (aws-c-s3 RDMA_PROTOCOL_SPEC.md).

Review fixes:
- formatRdmaToken uses PRIx64 (portable).
- rdmaGet rejects size==0 up front.
- Credentials are resolved once at construction (documented limitation:
  rotating IAM session tokens require a fresh backend).

Signed-off-by: Harshavardhana <harsha@minio.io>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/object/meson.build (1)

33-49: 🗄️ Data Integrity & Integration | 🟠 Major

Propagate cuobj_dep through obj_utils_interface. rdma.h is now part of the public object-utils API and includes <cuobjclient.h> behind HAVE_CUOBJ_CLIENT; link_with does not carry that dependency transitively, so consumers of obj_utils_interface still need cuobj_dep in dependencies: when it is available.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/meson.build` around lines 33 - 49, Propagate the optional
cuobj dependency through obj_utils_interface so consumers of the public
object-utils API can compile rdma.h when HAVE_CUOBJ_CLIENT is enabled. Update
the declare_dependency block for obj_utils_interface in meson.build to include
cuobj_dep in dependencies alongside the existing partial_aws_* deps, while
keeping it conditional via the same cuobj_dep.found() check used for
obj_utils_lib.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/object/rdma_protocol.h`:
- Around line 17-23: The RDMA protocol header relies on ssize_t for its
return-code sentinels, but that type is not declared by the current includes.
Update rdma_protocol.h so ssize_t is properly available by adding the
appropriate POSIX type declaration include, or alternatively replace the
sentinels with a fixed-width signed type defined within this header. Keep the
change localized near the existing rdma_protocol definitions so the return-code
constants remain consistent.

In `@src/utils/object/rdma.cpp`:
- Around line 454-468: Move the ctx.etag assignment in rdmaGet so it only
happens after the response has been fully validated and the transfer size has
been accepted, because ctx.etag is currently set before checking
x-amz-rdma-bytes-transferred. Keep the validation and return-path logic in
rdmaGet consistent with the rdma.h contract that etag is populated only on
success, and ensure malformed or out-of-range transferred-byte headers return
rdma_error without leaving ctx.etag populated. Use the existing rdmaGet and
ctx.etag symbols to place the fix in the GET success path.

In `@src/utils/object/rdma.h`:
- Around line 90-97: The new public S3RdmaClientCtx API currently exposes
snake_case member names, which conflicts with the repo’s lower-camel-case member
convention. Rename the fields upload_id, part_number, and checksum_crc64nvme to
lower camel case in S3RdmaClientCtx, and update all uses of those members
accordingly so downstream code only sees the corrected public names.

---

Outside diff comments:
In `@src/utils/object/meson.build`:
- Around line 33-49: Propagate the optional cuobj dependency through
obj_utils_interface so consumers of the public object-utils API can compile
rdma.h when HAVE_CUOBJ_CLIENT is enabled. Update the declare_dependency block
for obj_utils_interface in meson.build to include cuobj_dep in dependencies
alongside the existing partial_aws_* deps, while keeping it conditional via the
same cuobj_dep.found() check used for obj_utils_lib.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 91d0eeaa-dd6f-4bde-9076-474a62cc28b2

📥 Commits

Reviewing files that changed from the base of the PR and between 765e429 and 58aee55.

📒 Files selected for processing (6)
  • src/utils/object/meson.build
  • src/utils/object/rdma.cpp
  • src/utils/object/rdma.h
  • src/utils/object/rdma_protocol.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp

Comment on lines +17 to +23
#include <charconv>
#include <cinttypes>
#include <cstdint>
#include <cstdio>
#include <string>
#include <system_error>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

set -euo pipefail

# Inspect the target header with line numbers
sed -n '1,140p' src/utils/object/rdma_protocol.h | cat -n

# Check whether ssize_t is declared anywhere this header includes, and where it is used
grep -RIn --line-number --fixed-strings "ssize_t" src/utils/object/rdma_protocol.h src | head -n 50

# Map nearby declarations/usages in the same file
python3 - <<'PY'
from pathlib import Path
p = Path('src/utils/object/rdma_protocol.h')
for i, line in enumerate(p.read_text().splitlines(), 1):
    if 'ssize_t' in line or 'rdma_not_supported' in line or 'rdma_error' in line or '`#include`' in line:
        print(f"{i}: {line}")
PY

Repository: ai-dynamo/nixl

Length of output: 9433


Include <sys/types.h> for ssize_t

rdma_protocol.h uses ssize_t for the return-code sentinels, but none of the current includes declare it. Add the POSIX type declaration here or switch these sentinels to a fixed-width signed type owned by this header.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/rdma_protocol.h` around lines 17 - 23, The RDMA protocol
header relies on ssize_t for its return-code sentinels, but that type is not
declared by the current includes. Update rdma_protocol.h so ssize_t is properly
available by adding the appropriate POSIX type declaration include, or
alternatively replace the sentinels with a fixed-width signed type defined
within this header. Keep the change localized near the existing rdma_protocol
definitions so the return-code constants remain consistent.

Comment thread src/utils/object/rdma.cpp
Comment on lines +454 to +468
if (resp->HasHeader("etag")) {
ctx.etag = stripQuotes(resp->GetHeader("etag").c_str());
}

// Trust the server's reported transferred byte count (can be < requested
// for ranged/partial GETs).
if (resp->HasHeader(amz_rdma_bytes_transferred)) {
try {
const long long n = std::stoll(resp->GetHeader(amz_rdma_bytes_transferred).c_str());
if (n < 0 || static_cast<uint64_t>(n) > size) {
NIXL_ERROR << "rdmaGet: invalid x-amz-rdma-bytes-transferred=" << n
<< " (requested " << size << ") for key=" << ctx.object;
return rdma_error;
}
return static_cast<ssize_t>(n);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Only publish ctx.etag after the GET is fully accepted.

Line 96 in src/utils/object/rdma.h documents etag as populated on success, but this code writes it before validating x-amz-rdma-bytes-transferred. If that header is malformed or out of range, rdmaGet() returns rdma_error with a success-looking ETag left behind.

Proposed fix
-        if (resp->HasHeader("etag")) {
-            ctx.etag = stripQuotes(resp->GetHeader("etag").c_str());
-        }
+        const std::string etag =
+            resp->HasHeader("etag") ? stripQuotes(resp->GetHeader("etag").c_str()) : "";
@@
-                return static_cast<ssize_t>(n);
+                ctx.etag = etag;
+                return static_cast<ssize_t>(n);
             }
             catch (const std::exception &) {
                 return rdma_error;
             }
         }
+        ctx.etag = etag;
         return static_cast<ssize_t>(size);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (resp->HasHeader("etag")) {
ctx.etag = stripQuotes(resp->GetHeader("etag").c_str());
}
// Trust the server's reported transferred byte count (can be < requested
// for ranged/partial GETs).
if (resp->HasHeader(amz_rdma_bytes_transferred)) {
try {
const long long n = std::stoll(resp->GetHeader(amz_rdma_bytes_transferred).c_str());
if (n < 0 || static_cast<uint64_t>(n) > size) {
NIXL_ERROR << "rdmaGet: invalid x-amz-rdma-bytes-transferred=" << n
<< " (requested " << size << ") for key=" << ctx.object;
return rdma_error;
}
return static_cast<ssize_t>(n);
const std::string etag =
resp->HasHeader("etag") ? stripQuotes(resp->GetHeader("etag").c_str()) : "";
// Trust the server's reported transferred byte count (can be < requested
// for ranged/partial GETs).
if (resp->HasHeader(amz_rdma_bytes_transferred)) {
try {
const long long n = std::stoll(resp->GetHeader(amz_rdma_bytes_transferred).c_str());
if (n < 0 || static_cast<uint64_t>(n) > size) {
NIXL_ERROR << "rdmaGet: invalid x-amz-rdma-bytes-transferred=" << n
<< " (requested " << size << ") for key=" << ctx.object;
return rdma_error;
}
ctx.etag = etag;
return static_cast<ssize_t>(n);
}
catch (const std::exception &) {
return rdma_error;
}
}
ctx.etag = etag;
return static_cast<ssize_t>(size);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/rdma.cpp` around lines 454 - 468, Move the ctx.etag
assignment in rdmaGet so it only happens after the response has been fully
validated and the transfer size has been accepted, because ctx.etag is currently
set before checking x-amz-rdma-bytes-transferred. Keep the validation and
return-path logic in rdmaGet consistent with the rdma.h contract that etag is
populated only on success, and ensure malformed or out-of-range transferred-byte
headers return rdma_error without leaving ctx.etag populated. Use the existing
rdmaGet and ctx.etag symbols to place the fix in the GET success path.

Comment thread src/utils/object/rdma.h
Comment on lines +90 to +97
struct S3RdmaClientCtx {
std::string bucket;
std::string object;
std::string upload_id; // empty for single-shot (non-multipart)
uint32_t part_number = 0; // 1..=10000 when upload_id is set
std::string checksum_crc64nvme; // optional, in/out
std::string etag; // populated on success
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Rename the new S3RdmaClientCtx fields before this API escapes.

upload_id, part_number, and checksum_crc64nvme expose snake_case on a brand-new public struct, so every downstream engine will inherit names that already violate the repo rule for members. This is still cheap to align now.

Proposed fix
 struct S3RdmaClientCtx {
     std::string bucket;
     std::string object;
-    std::string upload_id; // empty for single-shot (non-multipart)
-    uint32_t part_number = 0; // 1..=10000 when upload_id is set
-    std::string checksum_crc64nvme; // optional, in/out
+    std::string uploadId; // empty for single-shot (non-multipart)
+    uint32_t partNumber = 0; // 1..=10000 when uploadId is set
+    std::string checksumCrc64nvme; // optional, in/out
     std::string etag; // populated on success
 };

As per path instructions, "Use lower camel case for functions/members; snake case for variables/namespace names/constants".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
struct S3RdmaClientCtx {
std::string bucket;
std::string object;
std::string upload_id; // empty for single-shot (non-multipart)
uint32_t part_number = 0; // 1..=10000 when upload_id is set
std::string checksum_crc64nvme; // optional, in/out
std::string etag; // populated on success
};
struct S3RdmaClientCtx {
std::string bucket;
std::string object;
std::string uploadId; // empty for single-shot (non-multipart)
uint32_t partNumber = 0; // 1..=10000 when uploadId is set
std::string checksumCrc64nvme; // optional, in/out
std::string etag; // populated on success
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/rdma.h` around lines 90 - 97, The new public S3RdmaClientCtx
API currently exposes snake_case member names, which conflicts with the repo’s
lower-camel-case member convention. Rename the fields upload_id, part_number,
and checksum_crc64nvme to lower camel case in S3RdmaClientCtx, and update all
uses of those members accordingly so downstream code only sees the corrected
public names.

Source: Path instructions

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/object/meson.build`:
- Around line 32-42: The obj_utils_interface target is missing public
dependencies needed by the new rdma.h header, so downstream consumers can
compile-fail even though obj_utils_lib links correctly. Update
obj_utils_interface to forward the same public dependencies as obj_utils for
headers that include object/rdma.h, specifically nixl_common_dep and
conditionally cuobj_dep alongside the existing AWS deps, so any consumer of the
interface inherits the required include paths and symbols.

In `@src/utils/object/rdma.cpp`:
- Around line 276-280: The endpoint override handling in the RDMA setup only
switches the scheme for HTTP, so HTTPS overrides can still leave impl_->scheme
on http. Update the endpoint parsing block in rdma.cpp to explicitly set
impl_->scheme based on the parsed Aws::Http::URI scheme, using the existing
config.scheme as the fallback when the override is not HTTP or HTTPS, and keep
the host/port assignment tied to the same endpointOverride path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 79b69abf-81c1-4381-9f51-b108f7e8345e

📥 Commits

Reviewing files that changed from the base of the PR and between 58aee55 and 48be98d.

📒 Files selected for processing (6)
  • src/utils/object/meson.build
  • src/utils/object/rdma.cpp
  • src/utils/object/rdma.h
  • src/utils/object/rdma_protocol.h
  • test/gtest/unit/obj/meson.build
  • test/gtest/unit/obj/rdma_protocol.cpp

Comment on lines +32 to +42
obj_utils_sources = ['s3/utils.cpp', 's3/utils.h', 'rdma_protocol.h', 'rdma.cpp', 'rdma.h']
obj_utils_deps = [nixl_common_dep, partial_aws_s3, partial_aws_s3_crt, partial_aws_core]
if cuobj_dep.found()
obj_utils_deps += [cuobj_dep]
endif

# Generic object utilities library containing S3 utilities and other object storage utilities
obj_utils_lib = library('obj_utils',
's3/utils.cpp', 's3/utils.h',
obj_utils_sources,
include_directories: [nixl_inc_dirs, utils_inc_dirs],
dependencies: [nixl_common_dep, partial_aws_s3, partial_aws_s3_crt, partial_aws_core],
dependencies: obj_utils_deps,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Propagate the new public header dependencies through obj_utils_interface.

rdma.h is now part of obj_utils’s public surface, but the interface target still exports only the AWS deps. Consumers that include object/rdma.h will also need nixl_common_dep for nixl_types.h and, when enabled, cuobj_dep for <cuobjclient.h>. obj_utils_lib links today, but downstream builds can still fail at compile time because obj_utils_interface does not forward those dependencies.

Proposed fix
     obj_utils_interface = declare_dependency(
         link_with: obj_utils_lib,
         include_directories: [include_directories('.'), include_directories('s3')],
-        dependencies: [partial_aws_s3, partial_aws_s3_crt, partial_aws_core]
+        dependencies: obj_utils_deps
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/meson.build` around lines 32 - 42, The obj_utils_interface
target is missing public dependencies needed by the new rdma.h header, so
downstream consumers can compile-fail even though obj_utils_lib links correctly.
Update obj_utils_interface to forward the same public dependencies as obj_utils
for headers that include object/rdma.h, specifically nixl_common_dep and
conditionally cuobj_dep alongside the existing AWS deps, so any consumer of the
interface inherits the required include paths and symbols.

Comment thread src/utils/object/rdma.cpp
Comment on lines +276 to +280
if (!config.endpointOverride.empty()) {
Aws::Http::URI ep(config.endpointOverride);
impl_->scheme = (ep.GetScheme() == Aws::Http::Scheme::HTTP) ? "http" : impl_->scheme;
impl_->host = ep.GetAuthority();
impl_->port = ep.GetPort();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Honor https:// endpoint overrides explicitly.

Line 278 only updates impl_->scheme when the parsed override is HTTP. If config.scheme is HTTP and endpointOverride is an HTTPS URI, this leaves the control plane on "http" and builds/signs requests for the wrong transport.

Proposed fix
         if (!config.endpointOverride.empty()) {
             Aws::Http::URI ep(config.endpointOverride);
-            impl_->scheme = (ep.GetScheme() == Aws::Http::Scheme::HTTP) ? "http" : impl_->scheme;
+            if (ep.GetScheme() == Aws::Http::Scheme::HTTP) {
+                impl_->scheme = "http";
+            } else if (ep.GetScheme() == Aws::Http::Scheme::HTTPS) {
+                impl_->scheme = "https";
+            }
             impl_->host = ep.GetAuthority();
             impl_->port = ep.GetPort();
         } else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!config.endpointOverride.empty()) {
Aws::Http::URI ep(config.endpointOverride);
impl_->scheme = (ep.GetScheme() == Aws::Http::Scheme::HTTP) ? "http" : impl_->scheme;
impl_->host = ep.GetAuthority();
impl_->port = ep.GetPort();
if (!config.endpointOverride.empty()) {
Aws::Http::URI ep(config.endpointOverride);
if (ep.GetScheme() == Aws::Http::Scheme::HTTP) {
impl_->scheme = "http";
} else if (ep.GetScheme() == Aws::Http::Scheme::HTTPS) {
impl_->scheme = "https";
}
impl_->host = ep.GetAuthority();
impl_->port = ep.GetPort();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/object/rdma.cpp` around lines 276 - 280, The endpoint override
handling in the RDMA setup only switches the scheme for HTTP, so HTTPS overrides
can still leave impl_->scheme on http. Update the endpoint parsing block in
rdma.cpp to explicitly set impl_->scheme based on the parsed Aws::Http::URI
scheme, using the existing config.scheme as the fallback when the override is
not HTTP or HTTPS, and keep the host/port assignment tied to the same
endpointOverride path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant