posix: build the io pool from the clamped size, not the raw request#1827
posix: build the io pool from the clamped size, not the raw request#1827EylonKrause wants to merge 1 commit into
Conversation
nixlPosixIOQueueImpl sized its ios_ pool and free_ios_ free-list from the raw ios_pool_size constructor argument, but the base class clamps that value to [MIN_IOS_POOL_SIZE=64, MAX_IOS_POOL_SIZE] and stores the clamped result in ios_pool_size_, which the io_uring and Linux-AIO completion handlers use as the "all ios free" sentinel (free_ios_.size() == ios_pool_size_). For a requested ios_pool_size of 1..63 (settable via the ios_pool_size backend param) the pool holds fewer than 64 entries, so free_ios_.size() can never equal ios_pool_size_ and doCheckCompleted never returns NIXL_SUCCESS -- the transfer never completes. Build the pool from the clamped ios_pool_size_ so the pool size and the completion check agree. Signed-off-by: Eylon Krause <eylon1909@gmail.com>
|
👋 Hi EylonKrause! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn ChangesPOSIX IO Queue Pool Size Fix
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
What?
nixlPosixIOQueueImpl(src/plugins/posix/io_queue.h) sized itsios_pool andfree_ios_free-list from the rawios_pool_sizeconstructor argument. The baseclass
nixlPosixIOQueueclamps that value to[MIN_IOS_POOL_SIZE (64), MAX_IOS_POOL_SIZE]and stores the clamped result inios_pool_size_. This PR buildsthe pool from
ios_pool_size_so the pool and the rest of the class agree.Why?
The io_uring and Linux-AIO completion handlers detect "all I/O finished" with
if (free_ios_.size() == ios_pool_size_) return NIXL_SUCCESS;(
io_uring_io_queue.cpp:137,linux_aio_io_queue.cpp:149,179).ios_pool_size_isthe clamped value, but the pool was built with the raw count. For a requested
ios_pool_sizeof 1..63 (settable via theios_pool_sizebackend param) the poolholds fewer than 64 entries, so
free_ios_.size()can never reachios_pool_size_—doCheckCompletednever returnsNIXL_SUCCESSand the transfer never completes (hang).Reproduction
A self-contained model of the base clamp + pool construction + the "all free" check:
How (verification)
posix_aio_io_queue.cpp,io_uring_io_queue.cpp,which instantiate
nixlPosixIOQueueImpl) in-tree with-Dsanitizer=address,undefined(exit 0; liburing builds via the meson subproject).Related Issues
None.
Summary by CodeRabbit