Buffer HTTP request bytes once per connection#215
Merged
Conversation
HTTPDecoder pulls one byte per syscall while parsing the status line and headers: `bytes.lines.takeNext()` and `readHeaders(from:)` both end up in CollectUntil.next() calling iterator.next(), which on AsyncSocketReadSequence does an unbuffered `socket.read()` per byte. For a typical request with 200–500 bytes of status line + headers that's 200–500 single-byte read(2) syscalls and a corresponding suspendSocket cycle whenever a TCP segment boundary lands mid-header. Adding an internal buffer to AsyncSocketReadSequence.next() would lose bytes between iterators, because HTTPDecoder constructs a fresh iterator for the body reader and HTTPRequestSequence creates a fresh iterator per request on a keepalive connection. Any bytes buffered-but-unconsumed when one iterator is dropped would be unreachable to the next. Add AsyncBufferingSequence<Base>: a reference-typed wrapper backed by an actor that owns one iterator into Base and a shared in-memory buffer. Iterators created from the same wrapper consume from the shared backing buffer, so bytes pulled from Base are never lost between successive iterators. Uses the same Transferring idiom that AsyncSharedReplaySequence already uses to call mutating async functions on a value-type iterator across actor isolation. Wrap socket.bytes once per HTTPConnection and thread the wrapper through both HTTPRequestSequence and the WebSocket upgrade path, so any bytes pulled past the upgrade request remain available to the framer. Measurements: release build of an MRP REST daemon under identical workload, 16 s perf captures: total cycles 45.8e9 -> 42.2e9 (-7.9%); average CPU rate 3056 Mc/s -> 2649 Mc/s (-13%). HTTPDecoder.decodeRequest self time drops from indistinguishable in the noise to 0.0-0.02%; the parser essentially disappears from the profile. All 426 existing tests pass.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #215 +/- ##
==========================================
+ Coverage 92.89% 92.92% +0.03%
==========================================
Files 70 71 +1
Lines 3659 3719 +60
==========================================
+ Hits 3399 3456 +57
- Misses 260 263 +3 ☔ View full report in Codecov by Sentry. |
Contributor
Author
Thank our AI overlords :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AI disclosure: Claude-generated as part of analysis trying to reduce sys call overhead.
Summary
Wraps
socket.bytesonce perHTTPConnectionin a newAsyncBufferingSequence, so that the multiple iterators the HTTP decoder creates during one request (status line, headers, body) all share a single ~4 KB read buffer instead of each calling through to the underlying socket on every byte.Motivation
HTTPDecoder.decodeRequestparses the status line and headers viabytes.lines.takeNext()/readHeaders(from: bytes), which delegate toCollectUntil.AsyncIterator.next():The base iterator is
AsyncSocketReadSequence.next():AsyncSocketReadSequencealready conforms toAsyncBufferedSequenceand has a perfectly goodnextBuffer(suggested:)that pulls up to 4 KB at a time, but thenext()implementation ignores it and does an unbuffered single-byteread(2)for every byte.For a typical HTTP/1.1 request with 200–500 bytes of status line + headers, this means 200–500 single-byte syscalls per request and, if a TCP segment boundary lands mid-header, a corresponding number of full
suspendSocketactor-hop /epoll_ctl MOD/ wakeup cycles. The body path is unaffected —HTTPDecoder.readDataalready usesiterator.nextBuffer(count:).A naive fix — adding an internal buffer to
AsyncSocketReadSequence.next()— would lose bytes between iterators, becauseHTTPDecoderconstructs a fresh iterator for the body reader (line 189) separate from the one used for the header parser, andHTTPRequestSequencecreates a fresh iterator per request on a keepalive connection. Any bytes buffered-but-unconsumed when one iterator is dropped would be unreachable to the next.Approach
Add
AsyncBufferingSequence<Base>to FlyingSocks: a small wrapper backed by an actor that owns one iterator intoBaseand a shared in-memory buffer.makeAsyncIterator()returns iterators whosenext()andnextBuffer(suggested:)both consume from the shared backing buffer, so bytes pulled fromBaseare never lost between successive iterators on the same wrapper.HTTPConnection.initwrapssocket.bytesinAsyncBufferingSequenceonce and passes it to bothHTTPRequestSequenceand the WebSocket upgrade path, so any bytes pulled past the upgrade request remain available to the framer.The wrapper uses the same
Transferringidiom thatAsyncSharedReplaySequence.requestNextChunkalready uses to call mutating async functions on a value-type iterator across actor isolation. It is designed for serial consumption (one connection's parser at a time), which matches all current callers.Measurements
Profiled with
perfagainst a release build of an MRP REST daemon hammering FlyingFox; 16-second captures, identical workload either side._dispatch_semaphore_wait_slow(cum)HTTPDecoder.decodeRequestselfThe proportional shape of the rest of the profile (SocketPool actor, continuation resumption, dispatch worker overhead) is unchanged — the win comes from eliminating the per-byte syscall + potential
suspendSockettraffic during header parsing.Test plan
swift test --package-path .— 426/426 passing locally (Linux, Swift 6.x)Notes / open questions
AsyncBufferingSequenceispackage-scoped for now; happy to make itpublicif there's demand. There's nothing FlyingFox-specific about it — it's a general utility for "wrap anAsyncBufferedSequenceso several iterators can share its buffer."suggestedBufferSizeis 4096. Open to bikeshedding; could be configurable onHTTPServer.Configuration.switchToWebSocket) now reads frames from the same buffered stream rather than a freshsocket.bytes. Required for correctness if the buffer holds bytes past the upgrade request, and matches HTTP/1.1's "client MUST NOT send data after Upgrade until 101" rule.