Skip to content

mooncake: fix request array leak on postXfer error path#1828

Open
EylonKrause wants to merge 1 commit into
ai-dynamo:mainfrom
EylonKrause:fix/mooncake-postxfer-leak
Open

mooncake: fix request array leak on postXfer error path#1828
EylonKrause wants to merge 1 commit into
ai-dynamo:mainfrom
EylonKrause:fix/mooncake-postxfer-leak

Conversation

@EylonKrause

@EylonKrause EylonKrause commented Jun 24, 2026

Copy link
Copy Markdown

What?

nixlMooncakeEngine::postXfer (src/plugins/mooncake/mooncake_backend.cpp) allocated
the transfer_request_t array with new[] and only delete[]'d it once, after the
submit loop:

-    transfer_request_t *request = new transfer_request_t[request_count];
+    std::vector<transfer_request_t> request(request_count);
     for (size_t index = 0; index < request_count; ++index) {
         if (local[index].len != remote[index].len) return NIXL_ERR_INVALID_PARAM; // leaks request
         ...
     }
     ...
-    delete[] request;

It now uses std::vector and passes request.data() to submitTransfer /
submitTransferWithNotify.

Why?

The per-descriptor length check inside the fill loop returns
NIXL_ERR_INVALID_PARAM before the lone delete[], leaking the whole array on
every length-mismatched request. The descCount() equality is pre-checked, but the
per-descriptor len is not — so that early return is reachable from caller input.
std::vector frees on every exit path and removes the manual delete[].

Reproduction

The Mooncake plugin requires the Mooncake Transfer Engine library, which isn't present
in my environment, so I verified the leak with an extracted reproducer of the
new[] / loop-with-early-return / delete[] shape (2 descriptors, the second with a
length mismatch), built with -fsanitize=address,leak:

buggy (new[]):       SUMMARY: AddressSanitizer: 80 byte(s) leaked in 1 allocation(s)
fixed (std::vector): no leak

std::vector is already used elsewhere in this file, so the change needs no new include.

Related Issues

None.

Summary by CodeRabbit

  • Bug Fixes
    • Improved transfer request handling by using safer, size-managed buffering during data submission, reducing the likelihood of memory-related issues and improving reliability for both notification and non-notification transfer paths.

@EylonKrause EylonKrause requested a review from a team as a code owner June 24, 2026 15:22
@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown

👋 Hi EylonKrause! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 37591dd6-6723-4160-acf2-37ff0b91deac

📥 Commits

Reviewing files that changed from the base of the PR and between 626a569 and 3f0b91d.

📒 Files selected for processing (1)
  • src/plugins/mooncake/mooncake_backend.cpp

📝 Walkthrough

Walkthrough

nixlMooncakeEngine::postXfer now uses an std::vector<transfer_request_t> sized by request_count instead of a raw dynamic array, and both transfer submission paths now receive request.data().

Changes

Transfer request buffer refactor in mooncake_backend

Layer / File(s) Summary
std::vector allocation and submission call sites
src/plugins/mooncake/mooncake_backend.cpp
Allocation changed to std::vector<transfer_request_t> sized by request_count; both submitTransferWithNotify and submitTransfer calls updated to pass request.data(); explicit delete[] removed.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A bunny hopped in postXfer with glee,
Swapping new[] for vector harmony.
With .data() in tow,
The bytes glide in a row —
Hop-hop, clean code for me! 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main fix: a postXfer request array leak on an error path.
Description check ✅ Passed The description matches the template well, covering what changed, why it was needed, and enough implementation context.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/plugins/mooncake/mooncake_backend.cpp`:
- Line 288: The submitTransferWithNotify call in mooncake_backend.cpp exceeds
the style line-length limit and should be wrapped to keep lines at or under 100
characters. Reformat the call at the submitTransferWithNotify site in
mooncake_backend.cpp so the arguments are split across multiple lines,
preferably one argument per line, while preserving the existing rc assignment
and behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d77f288b-0585-49ee-a9f1-464a2a4e4c11

📥 Commits

Reviewing files that changed from the base of the PR and between a9f456b and 626a569.

📒 Files selected for processing (1)
  • src/plugins/mooncake/mooncake_backend.cpp

Comment thread src/plugins/mooncake/mooncake_backend.cpp Outdated
nixlMooncakeEngine::postXfer allocated the transfer_request_t array with
new[] and only delete[]'d it after the submit loop. The per-descriptor
length check inside that loop returns NIXL_ERR_INVALID_PARAM before the
delete[], leaking the whole array on every length-mismatched request
(descCount() equality is pre-checked, but per-descriptor len is not, so
that early return is reachable from caller input).

Use std::vector<transfer_request_t> so the array is freed on every exit
path, and pass request.data() to submitTransfer/submitTransferWithNotify.

Signed-off-by: Eylon Krause <eylon1909@gmail.com>
@EylonKrause EylonKrause force-pushed the fix/mooncake-postxfer-leak branch from 626a569 to 3f0b91d Compare June 24, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant