Skip to content

Add SO_ZEROCOPY and MSG_ZEROCOPY support for UDP and TCP#1720

Open
davidBar-On wants to merge 6 commits intoesnet:masterfrom
davidBar-On:issue-1678-SO_ZEROCOPY-MSG_ZEROCOPY
Open

Add SO_ZEROCOPY and MSG_ZEROCOPY support for UDP and TCP#1720
davidBar-On wants to merge 6 commits intoesnet:masterfrom
davidBar-On:issue-1678-SO_ZEROCOPY-MSG_ZEROCOPY

Conversation

@davidBar-On
Copy link
Copy Markdown
Contributor

Add support for zero-copy for both UDP and TCP, using SO_ZEROCOPY/MSG_ZEROCOPY. Although it is not clear when and in what environments the added functionality improve performance, support was added to allow testing this functionality. The assumption is that different environments may have different levels of support for this functionality.
(Note that running bootstrap.sh; configure is required before make to support the new features.)

The added support is for MSG_ZEROCOPY. When used, socket option SO_ZEROCOPY is set and send(...., MSG_ZEROCOPY) is used instead of write(). MSG_ZEROCOPY is used in the following cases:

  • UDP: when -Z/--zerocopy option is set.
  • TCP: when --zerocopy=z is set. Otherwise, sendfile() continue to be used for TCP zero copy.

This PR is a rework of the original suggested support in #1690, as it was too naive and did not include support for notifications.

The main implementation is based on the msg_zerocopy.c example for supporting MSG_ZEROCOPY.

Note that per the Zero-copy networking discussion about the patch that added MSG_ZEROCOPY support, the expected performance improvement is expected to be less than 10% and only for large packets. Therefor, it would be helpful if this PR will be tested on different environments to see whether and when it improves performance.

@davidBar-On
Copy link
Copy Markdown
Contributor Author

Closing, as it is not clear that this functionality is valuable, as I wrote in the description. (Code is still available for testing or coming up with enhanced solution.)

@marcosfsch
Copy link
Copy Markdown
Contributor

marcosfsch commented Sep 18, 2024

MSG_ZEROCOPY with MTU 9K and optmem_max 1MB does improve performance a lot for large TCP flows, specially when used with pacing (up to 35%, from 37G to 50G). Brian Tiernery and I, made the tests and the results can be checker in [1, 2]. This results were run using an older patch [3], this current patch seems to have some big performance regressions.
I'll run new UDP tests to provide evidence if there are improvements and I hope that this patch can be upstreamed.

[1] https://github.com/marcosfsch/INDIS-2024/blob/main/AmLight/ubuntu22-6.8/summary.txt
[2] https://github.com/marcosfsch/INDIS-2024/blob/main/ESnet/ubuntu22-6.8/summary.txt
[3] #1690

@marcosfsch
Copy link
Copy Markdown
Contributor

marcosfsch commented Sep 18, 2024

For UDP, this patch is making performance worse, from 36G to 5.5G. But using a previous MSG_ZEROCOPY patch [1] I was able to get 18% better performance for UDP. From 36G to 42G for single UDP flow.

[1] #1690

@bltierney
Copy link
Copy Markdown
Contributor

Also, to get improvements with MSG_ZEROCOPY, you have to set this in /etc/sysctl.conf:

net.core.optmem_max = 1048576

But we are finding MSG_ZEROCOPY helps a lot on 100G hosts.

@davidBar-On
Copy link
Copy Markdown
Contributor Author

OK. Thanks for the input. @bmah888, can you please re-open this PR?

@bltierney
Copy link
Copy Markdown
Contributor

I should also mention that we were using your original patch 1690, not patch 1720. Seems like this should not matter, but thought I should mention that.

@davidBar-On
Copy link
Copy Markdown
Contributor Author

I should also mention that we were using your original patch 1690, not patch 1720. Seems like this should not matter, but thought I should mention that.

That is interesting to know. As far as I remember that should not matter. The reason I split #1960 to this PR and #1717 (SKIP-RX-COPY) is that per my internal tests I assumed there is more chance that SKIP-RX-COPY will be merged into the mainline. Per this discussion I now understand this may not be the case, but still I assume it is easier to merge less changes per each PT than merge all.

@davidBar-On davidBar-On reopened this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants