Skip to content

Add xccl transport for device-resident put/get on Intel XPU#170

Open
songhappy wants to merge 1 commit into
meta-pytorch:mainfrom
songhappy:xccl-xpu-transport
Open

Add xccl transport for device-resident put/get on Intel XPU#170
songhappy wants to merge 1 commit into
meta-pytorch:mainfrom
songhappy:xccl-xpu-transport

Conversation

@songhappy

Copy link
Copy Markdown

Summary

  • New XcclTransportBuffer keeps ts.put / ts.get tensors on
    xpu end-to-end. Selector picks it above Gloo on XPU hosts; CUDA
    behavior is unchanged (xccl_available() returns False).
  • Generalizes example/torchstore_rl.py to autodetect cuda/xpu/cpu.
  • Adds example/torchstore_state_dict.py: trainer + generator
    actors round-trip a multi-tensor LoRA-shaped state_dict — mirrors
    RL weight-sync.
  • Small XPU gates in shared_memory.py and torchcomms/cache.py.

Why

For RL weight sync (~600 MB state_dict per step), gloo stages
through CPU twice + TCP. xccl keeps bytes on XPU over oneCCL —
hundreds of ms → tens of ms per refresh.

Test plan

  • pre-commit run clean.
  • All three examples (torchstore_rl, torchstore_spmd,
    torchstore_state_dict) rc=0 on a 12×PVC node, torch
    2.12.0+xpu, bit-exact checksums.
  • CUDA-host smoke — reviewer with CUDA access requested.

XPU host: selector picks the new XcclTransportBuffer above Gloo;
tensors stay on xpu end-to-end. CUDA hosts unchanged
(xccl_available() returns False).

Also gates shared_memory + torchcomms/cache for XPU, generalizes
example/torchstore_rl.py to autodetect device, and adds
example/torchstore_state_dict.py — multi-tensor state_dict
round-trip mirroring GRPO weight-sync.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant