Skip to content

perftest: Fix seg-fault during clean-up for cm retry flow#378

Merged
sshaulnv merged 1 commit intolinux-rdma:masterfrom
SherrinZhou:fix/fix_retry_flow_clean_up
Mar 23, 2026
Merged

perftest: Fix seg-fault during clean-up for cm retry flow#378
sshaulnv merged 1 commit intolinux-rdma:masterfrom
SherrinZhou:fix/fix_retry_flow_clean_up

Conversation

@SherrinZhou
Copy link
Copy Markdown
Contributor

Hi, @sshaulnv.
Follow the last comment on PR #368 . This is the fix for the clean up routine that I mentioned.

Signed-off-by: Ruizhe Zhou <zhouruizhe@resnics.com>
@SherrinZhou
Copy link
Copy Markdown
Contributor Author

SherrinZhou commented Mar 23, 2026

With this fix, for the manual rejection issued by the server , server side print will be like:

************************************
* Waiting for client to connect... *
************************************

[DEBUG] Intentionally rejecting connection request #1

[DEBUG] Intentionally rejecting connection request #2
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : rsnc_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : OFF          Using Enhanced Reorder      : OFF
 CQ Moderation   : 1
 CQE Poll Batch  : Dynamic
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 Waiting for client rdma_cm QP to connect
 Please run the same command with the IB/RoCE interface IP
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x002f PSN 0x2691e6
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:01:01
 remote address: LID 0000 QPN 0x0049 PSN 0x207bd6
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:01:02
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 65536      5000             3107.19            3107.15              0.049714
---------------------------------------------------------------------------------------

Client side print will be like:

RDMA CM event error:
Event: RDMA_CM_EVENT_REJECTED; error: 28.

ERRNO: Operation not supported.
Failed to handle RDMA CM event.
ERRNO: Operation not supported.
Failed to connect RDMA CM events.
ERRNO: Operation not supported.
RDMA CM event error:
Event: RDMA_CM_EVENT_REJECTED; error: 28.

ERRNO: Operation not supported.
Failed to handle RDMA CM event.
ERRNO: Operation not supported.
Failed to connect RDMA CM events.
ERRNO: Operation not supported.
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : rsnc_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : OFF          Using Enhanced Reorder      : OFF
 TX depth        : 128
 CQ Moderation   : 1
 CQE Poll Batch  : Dynamic
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0049 PSN 0x207bd6
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:01:02
 remote address: LID 0000 QPN 0x002f PSN 0x2691e6
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:10:01:01
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3029.045000 != 4061.956000. CPU Frequency is not max.
 65536      5000             3107.19            3107.15              0.049714
---------------------------------------------------------------------------------------

Now the CM will correctly retry the connection attempt with no resource leak.

@sshaulnv
Copy link
Copy Markdown
Contributor

@SherrinZhou , tested it and it look great, thanks!

@sshaulnv sshaulnv merged commit 4ff43ea into linux-rdma:master Mar 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants