Skip to content

[EP] Add UDP source port configuration for RoCEv2 ECMP via UCCL_UDP_SPORT_BASE#946

Open
bkpathak wants to merge 1 commit into
uccl-project:mainfrom
bkpathak:issue_709
Open

[EP] Add UDP source port configuration for RoCEv2 ECMP via UCCL_UDP_SPORT_BASE#946
bkpathak wants to merge 1 commit into
uccl-project:mainfrom
bkpathak:issue_709

Conversation

@bkpathak

Copy link
Copy Markdown

Description

[EP] Add UDP source port configuration for RoCEv2 ECMP via UCCL_UDP_SPORT_BASE

Fixes #709

Type of Change

  • [ ] Bug fix
  • New feature
  • Documentation update

How Has This Been Tested?

Include any tests here.

  • Unit tests
  • Integration tests
  • Manual testing
    Build validated on Lambda Labs GH200 (sm_90, aarch64, Ubuntu 22.04) with mlx5 detected and -DMLX5 passed to compiler:
MLX5 detected, building with MLX5 support
 > Libraries: ['ibverbs', 'nl-3', 'nl-route-3', 'numa', 'mlx5']
 > CXX Flags: [..., '-DMLX5', ...]

libmlx5 confirmed linked in the built wheel:

ldd ep.abi3.so | grep mlx5
libmlx5-b7a5a219.so.1.22.39.0 => uccl.libs/libmlx5-b7a5a219.so.1.22.39.0

End-to-end test attempted on 2×GH200 instances. Failed at RDMA buffer MR registration, Lambda Labs exposes mlx5 as a VF, which does not support GPUDirect RDMA. Full end-to-end test requires bare metal with a physical mlx5 PF.

Checklist

  • I have run format.sh to follow the style guidelines.
  • I have run build.sh to verify compilation.
  • I have removed redundant variables and comments.
  • I have updated the documentation.
  • I have added tests.

Comment thread ep/README.md Outdated
| UCCL_IB_TC | Traffic class in RDMA network | 104/0 (IB/EFA) |
| UCCL_EP_ENABLE_AGGRESSIVE_ATOMIC | Use relaxed atomics with manual `s_waitcnt vmcnt(0)` fences instead of acquire/release semantics. Required on AMD CDNA so the combine receiver actually sees the producer's tail-pointer updates over XGMI; without it the kernel deadlocks at scale. | 1 on AMD, 0 on CUDA |
| UCCL_RDMA_ADAPTIVE_SLEEP | Enable adaptive sleeping on proxy threads, by putting the proxy threads into a sleeping state if there have been no new work requests / RDMA completion events after 120s. | null |
| UCCL_UDP_SPORT_BASE | Base UDP source port for RoCEv2 QPs on mlx5 devices. Used to create entropy for ECMP routers, load balancers and 802.3ad link aggregation switches. Each QP gets a unique port starting from this value. Valid range: 1-65535. | 0 (driver decides) |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove "Used to create entropy for ECMP routers, load balancers and 802.3ad link aggregation switches. Each QP gets a unique port starting from this value." which too lengthy I feel.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will rephrase it. I copied it from the man page https://man7.org/linux/man-pages/man3/mlx5dv_modify_qp_udp_sport.3.html

Comment thread ep/setup.py Outdated
else:
print("EFA not detected, building without EFA")
# MLX5 Detection
mlx5_home = os.getenv("MLX5_HOME", "/usr")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem to be mlx5's home folder. In addition, a server might have mlx5 nic as the frontend nic and other nic as the backend, so auto-detecting mlx5 folder does not always work.

How about not using auto-detection, but doing dlsym runtime loading for mlx5-specific function? An example can be found in https://github.com/NVIDIA/nccl/blob/master/src/misc/mlx5dvsymbols.cc.

  • This dlsym only gets triggered when UCCL_UDP_SPORT_BASE is specified in env.
  • Also, need to fail safe (rather than crash the whole problem) if an error happens.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I will look into the example and send an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Does UCCL-EP support configuring QP UDP source port on RoCEv2 network?

2 participants