Skip to content

[Question] Plans for nvidia-peermem / legacy GPUDirect RDMA support? #26

Description

@huangzhilin-hzl

Hi, I am testing mKernel on a two-node NVIDIA H20 RoCE environment with mlx5 HCAs and CUDA 13. The GEMM + AllReduce path can run, but the other inter-node RDMA-heavy paths appear to depend on CUDA VMM + DMA-BUF memory registration.

VMM DMA-BUF MR registration fails with:

ibv_reg_dmabuf_mr (retain) failed: Invalid argument
ibv_reg_mr(gpu) failed: Bad address

Do you have plans to support a nvidia_peermem / ibv_reg_mr(cudaMalloc) fallback path? If yes, what design would you prefer: global buffer backing option, per-kernel RDMA source/target option, or staging buffers?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions