Leverage Collective Operations for Device-to-Device Transfers

 When moving tensors between devices, we should leverage collective operations (e.g., send/recv) to transfer data directly between devices whenever possible. Currently, transferring a tensor to another device may require materializing it on the CPU as `TensorData`, which introduces unnecessary overhead. By using collective operations for device-to-device transfers, we can avoid this intermediate CPU materialization and improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage Collective Operations for Device-to-Device Transfers #4749

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Leverage Collective Operations for Device-to-Device Transfers #4749

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions