p2p: add CXI transport endpoint by fergusfinn · Pull Request #6 · doublewordai/uccl

fergusfinn · 2026-06-14T09:41:27Z

This PR adds a CXI/libfabric transport for the UCCL P2P endpoint used by NIXL. It is separate from the UCCL-EP CXI work in uccl-project#956 and uccl-project#997, but it targets the same Slingshot/Cassini cxi provider.

Benchmarking/testing

export UCCL_P2P_TRANSPORT=cxi
export UCCL_P2P_DISABLE_IPC=1
export UCCL_LIBFABRIC_SO=/opt/libfabric/lib/libfabric.so.1

# Run once per num_iovs in 1,2,4,8,16,32,64,128.
torchrun --nnodes=2 --nproc_per_node=1 --node-rank=<0|1> \
  --master_addr=127.0.0.1 --master_port=<port> \
  p2p/benchmarks/benchmark_uccl.py \
  --local-gpu-idx=<0|1> --device=gpu --sizes=1179648 --iters=16 \
  --num-iovs=<num_iovs> --mode=write --float-type=none

num_iovs	total_bytes	GB/s
1	1179648	19.60
2	2359296	21.58
4	4718592	22.56
8	9437184	23.34
16	18874368	23.81
32	37748736	24.00
64	75497472	24.10
128	150994944	24.17

Then, KVBench in NIXL (currently, kvbench doesn't have UCCL wired in, but it has a plan mode, and nixlbench does have uccl wired in):

 python3 main.py plan \
    --model ./examples/model_deepseek_r1.yaml \
    --model_config ./examples/block-tp1-pp8.yaml \
    --backend UCX \
    --source gpu \
    --destination gpu \
    --runtime_type ASIO \
    --num_iter 16 \
    --warmup_iter 16 \
    --num_threads 1 \
    --page_size 256 \
    --format json
    
  nixlbench \
    --runtime_type ASIO \
    --backend UCCL \
    --worker_type nixl \
    --initiator_seg_type VRAM \
    --target_seg_type VRAM \
    --scheme pairwise \
    --mode SG \
    --op_type WRITE \
    --total_buffer_size 351535104 \
    --num_initiator_dev 1 \
    --num_target_dev 1 \
    --start_block_size 1179648 \
    --max_block_size 1179648 \
    --start_batch_size 298 \
    --max_batch_size 298 \
    --num_iter 16 \
    --warmup_iter 16 \
    --large_blk_iter_ftr 16 \
    --num_threads 1

access	page_size	block_size	batch_size	total_bytes	GB/s	avg_lat_us	avg_prep_us	avg_post_us	avg_tx_us
block	256	1179648	298	351535104	23.141011	51.0	12.0	9.0	15165.0

Also wired it in to vLLM, and vLLM disaggregated prefill completes for Qwen3-0.6b with direct NIXL telemetry showing expected KV transfer volume.

Copilot

Pull request overview

This PR adds an optional HPE Slingshot/CXI (libfabric) transport backend to the UCCL P2P engine, selectable at runtime via UCCL_P2P_TRANSPORT=cxi, with build-time enablement via USE_CXI=1.

Changes:

Introduces CxiEndpoint (libfabric-based) and wires it into the engine’s runtime endpoint variant and notification path.
Adds a libfabric dlopen/dlsym wrapper scaffold and Makefile build toggles for CXI.
Adds CXI-related runtime configuration (README) and adjusts one-sided in-flight limits / transient-post handling.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
p2p/util/transport_type.h	Adds `CXI` transport selection and helper `is_cxi_transport()`.
p2p/util/common.h	Adds `UCCL_POST_TRANSIENT` return code used for transient post failures.
p2p/README.md	Documents how to build/run with CXI and new CXI-related env vars.
p2p/Makefile	Adds `USE_CXI` toggle, libfabric include path, and CXI sources.
p2p/engine.h	Extends `GenericEndpoint` + MR handle to support CXI endpoint/MR.
p2p/engine.cc	Adds CXI endpoint construction, transient rc handling, and CXI inflight tuning.
p2p/endpoint_wrapper.h	Adds CXI dispatch for regmr/deregmr/read/write and FIFO metadata encoding.
p2p/cxi/fabric_dl.h	New libfabric dlopen/dlsym helper.
p2p/cxi/fabric_dl.cc	New exported `fi_*` wrappers (currently partial).
p2p/cxi/cxi_endpoint.h	Declares `CxiEndpoint` and CXI FIFO metadata helpers.
p2p/cxi/cxi_endpoint.cc	Implements the CXI/libfabric endpoint, OOB handshake, MR reg, and RMA ops.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

fergusfinn force-pushed the cxi-p2p-engine branch from 0ff913d to 2f1a9ea Compare June 15, 2026 09:07

fergusfinn changed the base branch from cxi-ep to main June 15, 2026 09:07

fergusfinn force-pushed the cxi-p2p-engine branch 4 times, most recently from af8eff3 to 2014486 Compare June 15, 2026 17:54

p2p: add CXI transport backend

72e0e0e

fergusfinn force-pushed the cxi-p2p-engine branch from 2014486 to 72e0e0e Compare June 16, 2026 09:40

fergusfinn marked this pull request as ready for review June 16, 2026 10:14

Copilot AI review requested due to automatic review settings June 16, 2026 10:14

Copilot started reviewing on behalf of fergusfinn June 16, 2026 10:15 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread p2p/cxi/fabric_dl.cc

Comment thread p2p/engine.cc Outdated

Comment thread p2p/engine.cc Outdated

Comment thread p2p/engine.cc Outdated

Comment thread p2p/engine.cc Outdated

Comment thread p2p/cxi/cxi_endpoint.cc

p2p: fail direct posts on hard errors

7088689

fergusfinn requested a review from Copilot June 16, 2026 10:52

Copilot started reviewing on behalf of fergusfinn June 16, 2026 10:52 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread p2p/cxi/cxi_endpoint.cc

Comment thread p2p/cxi/cxi_endpoint.cc

Comment thread p2p/cxi/cxi_endpoint.cc

Comment thread p2p/cxi/cxi_endpoint.cc

p2p: harden CXI completion contexts

276f517

fergusfinn requested a review from Copilot June 16, 2026 11:46

Copilot started reviewing on behalf of fergusfinn June 16, 2026 11:46 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread p2p/Makefile

Comment thread p2p/cxi/cxi_endpoint.cc

fergusfinn added 2 commits June 16, 2026 16:05

p2p: avoid spinning on transient vector posts

b0a84d9

p2p: always request CXI delivery completion

c728050

fergusfinn closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

p2p: add CXI transport endpoint#6

p2p: add CXI transport endpoint#6
fergusfinn wants to merge 5 commits into
mainfrom
cxi-p2p-engine

fergusfinn commented Jun 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fergusfinn commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking/testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fergusfinn commented Jun 14, 2026 •

edited

Loading