EXAMPLES/DEVICE/EP/CSRC: Replaced torch::Event by cudaEvent_t wrapper.#1822
EXAMPLES/DEVICE/EP/CSRC: Replaced torch::Event by cudaEvent_t wrapper.#1822rakhmets wants to merge 3 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds CUDA warning and event wrappers, refactors ChangesCUDA Event RAII Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/device/ep/csrc/cuda_event.hpp`:
- Around line 28-67: The cuda::Event wrapper currently creates the event on
whichever device is current and then records it on an arbitrary caller stream,
which can cross CUDA contexts. Update Event::create() and
Event::record(cudaStream_t) to be device-aware by either taking an explicit
device index or deriving the stream’s device and switching to it before
cudaEventCreateWithFlags/cudaEventRecord, then restoring the previous device
state. Use the cuda::Event class, its create() helper, and record() method as
the main touchpoints.
In `@examples/device/ep/csrc/cuda_warn.hpp`:
- Around line 18-23: The header was already rewritten by the trailing-whitespace
pre-commit hook, so the PR still contains unstaged whitespace-only changes.
Update the `cuda_warn.hpp` header by committing the hook’s cleanup exactly as
produced, ensuring the whitespace-only edits are included in the change set so
CI can pass.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 52f639c5-a8c5-43d7-bc54-d013224c630b
📒 Files selected for processing (5)
examples/device/ep/csrc/cuda_event.hppexamples/device/ep/csrc/cuda_warn.hppexamples/device/ep/csrc/event_handle.hppexamples/device/ep/csrc/nixl_ep.cppexamples/device/ep/csrc/nixl_ep.hpp
| event = nullptr; | ||
| } | ||
|
|
||
| cudaEvent_t event; |
There was a problem hiding this comment.
I tried to keep the same code style as in nixl_ep.[h|c]pp.
Do you think that we should change code style here?
|
|
||
| namespace nixl_ep::cuda { | ||
| inline void | ||
| warn(cudaError_t status, const char *caller, const char *operation) noexcept { |
There was a problem hiding this comment.
Is the noexcept intentional?
There was a problem hiding this comment.
Yes, this function is supposed to be used when objects are destroyed.
Signed-off-by: Raul Akhmetshin <rakhmetshin@nvidia.com>
Signed-off-by: Raul Akhmetshin <rakhmetshin@nvidia.com>
Signed-off-by: Raul Akhmetshin <rakhmetshin@nvidia.com>
8e0367d to
3d0c532
Compare
|
/build |
What?
Replaced
torch::Eventbynixl_ep::cuda::Event.Cleaned up includes in
nixl_ep.[c|h]pp.Minor refactoring of
nixl_ep::EventHandle.Why?
This is a part of #1793.
Original PR is devided into parts to facilitate review.
Summary by CodeRabbit