[Question] Plans for nvidia-peermem / legacy GPUDirect RDMA support?

Hi, I am testing mKernel on a two-node NVIDIA H20 RoCE environment with mlx5 HCAs and CUDA 13. The GEMM + AllReduce path can run, but the other inter-node RDMA-heavy paths appear to depend on CUDA VMM + DMA-BUF memory registration.

VMM DMA-BUF MR registration fails with:
```
ibv_reg_dmabuf_mr (retain) failed: Invalid argument
ibv_reg_mr(gpu) failed: Bad address
```
Do you have plans to support a nvidia_peermem / ibv_reg_mr(cudaMalloc) fallback path? If yes, what design would you prefer: global buffer backing option, per-kernel RDMA source/target option, or staging buffers?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Plans for nvidia-peermem / legacy GPUDirect RDMA support? #26

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Question] Plans for nvidia-peermem / legacy GPUDirect RDMA support? #26

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions