Add xccl transport for device-resident put/get on Intel XPU by songhappy · Pull Request #170 · meta-pytorch/torchstore

songhappy · 2026-06-18T00:03:30Z

Summary

New XcclTransportBuffer keeps ts.put / ts.get tensors on
xpu end-to-end. Selector picks it above Gloo on XPU hosts; CUDA
behavior is unchanged (xccl_available() returns False).
Generalizes example/torchstore_rl.py to autodetect cuda/xpu/cpu.
Adds example/torchstore_state_dict.py: trainer + generator
actors round-trip a multi-tensor LoRA-shaped state_dict — mirrors
RL weight-sync.
Small XPU gates in shared_memory.py and torchcomms/cache.py.

Why

For RL weight sync (~600 MB state_dict per step), gloo stages
through CPU twice + TCP. xccl keeps bytes on XPU over oneCCL —
hundreds of ms → tens of ms per refresh.

Test plan

pre-commit run clean.
All three examples (torchstore_rl, torchstore_spmd,
torchstore_state_dict) rc=0 on a 12×PVC node, torch
2.12.0+xpu, bit-exact checksums.
CUDA-host smoke — reviewer with CUDA access requested.

XPU host: selector picks the new XcclTransportBuffer above Gloo; tensors stay on xpu end-to-end. CUDA hosts unchanged (xccl_available() returns False). Also gates shared_memory + torchcomms/cache for XPU, generalizes example/torchstore_rl.py to autodetect device, and adds example/torchstore_state_dict.py — multi-tensor state_dict round-trip mirroring GRPO weight-sync.

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xccl transport for device-resident put/get on Intel XPU#170

Add xccl transport for device-resident put/get on Intel XPU#170
songhappy wants to merge 1 commit into
meta-pytorch:mainfrom
songhappy:xccl-xpu-transport

songhappy commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

songhappy commented Jun 18, 2026

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant