Skip to content

Leverage Collective Operations for Device-to-Device Transfers #4749

@nathanielsimard

Description

@nathanielsimard

When moving tensors between devices, we should leverage collective operations (e.g., send/recv) to transfer data directly between devices whenever possible. Currently, transferring a tensor to another device may require materializing it on the CPU as TensorData, which introduces unnecessary overhead. By using collective operations for device-to-device transfers, we can avoid this intermediate CPU materialization and improve performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    High-Priority

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions