Skip to content

Buffer HTTP request bytes once per connection#215

Merged
swhitty merged 1 commit into
swhitty:mainfrom
PADL:lhoward/buffered-http-bytes
May 21, 2026
Merged

Buffer HTTP request bytes once per connection#215
swhitty merged 1 commit into
swhitty:mainfrom
PADL:lhoward/buffered-http-bytes

Conversation

@lhoward

@lhoward lhoward commented May 17, 2026

Copy link
Copy Markdown
Contributor

AI disclosure: Claude-generated as part of analysis trying to reduce sys call overhead.

Summary

Wraps socket.bytes once per HTTPConnection in a new AsyncBufferingSequence, so that the multiple iterators the HTTP decoder creates during one request (status line, headers, body) all share a single ~4 KB read buffer instead of each calling through to the underlying socket on every byte.

Motivation

HTTPDecoder.decodeRequest parses the status line and headers via bytes.lines.takeNext() / readHeaders(from: bytes), which delegate to CollectUntil.AsyncIterator.next():

while let element = try await iterator.next() {
    buffer.append(element)
    guard !until(buffer) else { break }
}

The base iterator is AsyncSocketReadSequence.next():

public mutating func next() async throws -> UInt8? {
    return try await socket.read()        // ← one byte per syscall
}

AsyncSocketReadSequence already conforms to AsyncBufferedSequence and has a perfectly good nextBuffer(suggested:) that pulls up to 4 KB at a time, but the next() implementation ignores it and does an unbuffered single-byte read(2) for every byte.

For a typical HTTP/1.1 request with 200–500 bytes of status line + headers, this means 200–500 single-byte syscalls per request and, if a TCP segment boundary lands mid-header, a corresponding number of full suspendSocket actor-hop / epoll_ctl MOD / wakeup cycles. The body path is unaffected — HTTPDecoder.readData already uses iterator.nextBuffer(count:).

A naive fix — adding an internal buffer to AsyncSocketReadSequence.next() — would lose bytes between iterators, because HTTPDecoder constructs a fresh iterator for the body reader (line 189) separate from the one used for the header parser, and HTTPRequestSequence creates a fresh iterator per request on a keepalive connection. Any bytes buffered-but-unconsumed when one iterator is dropped would be unreachable to the next.

Approach

Add AsyncBufferingSequence<Base> to FlyingSocks: a small wrapper backed by an actor that owns one iterator into Base and a shared in-memory buffer. makeAsyncIterator() returns iterators whose next() and nextBuffer(suggested:) both consume from the shared backing buffer, so bytes pulled from Base are never lost between successive iterators on the same wrapper.

HTTPConnection.init wraps socket.bytes in AsyncBufferingSequence once and passes it to both HTTPRequestSequence and the WebSocket upgrade path, so any bytes pulled past the upgrade request remain available to the framer.

The wrapper uses the same Transferring idiom that AsyncSharedReplaySequence.requestNextChunk already uses to call mutating async functions on a value-type iterator across actor isolation. It is designed for serial consumption (one connection's parser at a time), which matches all current callers.

Measurements

Profiled with perf against a release build of an MRP REST daemon hammering FlyingFox; 16-second captures, identical workload either side.

without buffering with buffering Δ
Total cycles 45.84 × 10⁹ 42.21 × 10⁹ −7.9%
Avg CPU rate 3056 Mc/s 2649 Mc/s −13%
_dispatch_semaphore_wait_slow (cum) 25.2% 24.3% −0.9 pp
HTTPDecoder.decodeRequest self (hidden in noise) 0.0–0.02% parser essentially disappears

The proportional shape of the rest of the profile (SocketPool actor, continuation resumption, dispatch worker overhead) is unchanged — the win comes from eliminating the per-byte syscall + potential suspendSocket traffic during header parsing.

Test plan

  • swift test --package-path . — 426/426 passing locally (Linux, Swift 6.x)
  • CI on macOS / iOS / tvOS / watchOS / Linux / Windows

Notes / open questions

  • AsyncBufferingSequence is package-scoped for now; happy to make it public if there's demand. There's nothing FlyingFox-specific about it — it's a general utility for "wrap an AsyncBufferedSequence so several iterators can share its buffer."
  • Default suggestedBufferSize is 4096. Open to bikeshedding; could be configurable on HTTPServer.Configuration.
  • The WebSocket upgrade path (switchToWebSocket) now reads frames from the same buffered stream rather than a fresh socket.bytes. Required for correctness if the buffer holds bytes past the upgrade request, and matches HTTP/1.1's "client MUST NOT send data after Upgrade until 101" rule.

HTTPDecoder pulls one byte per syscall while parsing the status line and
headers: `bytes.lines.takeNext()` and `readHeaders(from:)` both end up in
CollectUntil.next() calling iterator.next(), which on AsyncSocketReadSequence
does an unbuffered `socket.read()` per byte. For a typical request with
200–500 bytes of status line + headers that's 200–500 single-byte read(2)
syscalls and a corresponding suspendSocket cycle whenever a TCP segment
boundary lands mid-header.

Adding an internal buffer to AsyncSocketReadSequence.next() would lose bytes
between iterators, because HTTPDecoder constructs a fresh iterator for the
body reader and HTTPRequestSequence creates a fresh iterator per request on
a keepalive connection. Any bytes buffered-but-unconsumed when one iterator
is dropped would be unreachable to the next.

Add AsyncBufferingSequence<Base>: a reference-typed wrapper backed by an
actor that owns one iterator into Base and a shared in-memory buffer.
Iterators created from the same wrapper consume from the shared backing
buffer, so bytes pulled from Base are never lost between successive
iterators. Uses the same Transferring idiom that AsyncSharedReplaySequence
already uses to call mutating async functions on a value-type iterator
across actor isolation.

Wrap socket.bytes once per HTTPConnection and thread the wrapper through
both HTTPRequestSequence and the WebSocket upgrade path, so any bytes
pulled past the upgrade request remain available to the framer.

Measurements: release build of an MRP REST daemon under identical workload,
16 s perf captures: total cycles 45.8e9 -> 42.2e9 (-7.9%); average CPU rate
3056 Mc/s -> 2649 Mc/s (-13%). HTTPDecoder.decodeRequest self time drops
from indistinguishable in the noise to 0.0-0.02%; the parser essentially
disappears from the profile.

All 426 existing tests pass.
@codecov

codecov Bot commented May 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.54839% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.92%. Comparing base (de46fbb) to head (6238522).

Files with missing lines Patch % Lines
FlyingSocks/Sources/AsyncBufferingSequence.swift 92.15% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #215      +/-   ##
==========================================
+ Coverage   92.89%   92.92%   +0.03%     
==========================================
  Files          70       71       +1     
  Lines        3659     3719      +60     
==========================================
+ Hits         3399     3456      +57     
- Misses        260      263       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lhoward lhoward marked this pull request as ready for review May 17, 2026 00:56

@swhitty swhitty left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea!

@swhitty swhitty merged commit c3bf9c1 into swhitty:main May 21, 2026
13 checks passed
@lhoward

lhoward commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

great idea!

Thank our AI overlords :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants