feat(ws): native WebSocket transport, drop raw TCP listeners#1093
Conversation
The lobby server now accepts WebSocket connections directly on /ws (default port 8003) instead of raw TCP behind the ws_bridge_rs sidecar. This removes a transport-level hop that broke client reconnects, and keeps everything on HTTP so the deployment can stay fully behind DDoS-protected HTTPS ingress. - New WebSocketProtocol (aiohttp WebSocketResponse / ClientWebSocketResponse) using one JSON object per text frame. - ServerContext replaces asyncio.start_server with an aiohttp app, dropping the proxyprotocol detection path. - Config: LISTEN list replaced with WS_HOST / WS_PORT / WS_PATH. - Pipfile: proxy-protocol removed. - Tests rewritten to drive the server over real WebSocket clients; proxy-mode tests removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR migrates the server from stream/proxy-based multi-protocol handling to aiohttp WebSocket serving. It introduces a new WebSocketProtocol implementation, refactors ServerContext to use aiohttp's Application and TCPSite lifecycle, updates configuration and startup paths, adjusts deployment manifests, removes proxy-oriented fixtures and tests, and rewrites integration tests for WebSocket connections. ChangesWebSocket-only runtime and test migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/integration_tests/test_server_instance.py (1)
37-40: 💤 Low valueMinor docstring style nits flagged by static analysis.
The docstring has minor formatting issues (D205: blank line after summary, D212: summary on first line, D415: period at end). These are low-priority style concerns for test code.
✏️ Optional docstring fix
- """ - A single ServerInstance can host more than one ServerContext listening on - different ports. Verify both accept connections and share state. - """ + """Verify a ServerInstance can host multiple contexts with shared state."""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration_tests/test_server_instance.py` around lines 37 - 40, Update the module-level triple-quoted docstring that describes ServerInstance hosting multiple ServerContext instances so it follows docstring conventions: make the first line a one-sentence summary ending with a period, add a single blank line after that summary, and put the longer explanation on subsequent line(s); this will resolve the D205/D212/D415 style nits for that docstring.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/protocol/websocket.py`:
- Line 11: Remove the unused import `web` from the import statement in
server/protocol/websocket.py (change the line importing `WSMsgType, web` to only
import `WSMsgType`) so flake8 F401 is resolved; verify there are no references
to the `web` symbol elsewhere in that module and if any exist, either import
`web` where actually needed or refactor those references accordingly.
- Around line 60-65: write_messages is incrementing metrics.sent_messages once
per batch while write_raw already increments per-message, causing
double-counting; remove the batch-level increment in write_messages (delete or
disable the metrics.sent_messages.labels(self.__class__.__name__).inc() call) so
only write_raw updates metrics.sent_messages for each message; refer to
write_messages and write_raw and the metrics.sent_messages label usage when
making the change.
In `@server/servercontext.py`:
- Around line 157-163: The code currently unconditionally trusts
X-Forwarded-For/X-Real-IP when building peer_host/peername; change it so
forwarded headers are only used when request.remote is a trusted reverse proxy:
add or use a trusted-proxy check (e.g. is_trusted_proxy(request.remote) or
compare request.remote against a TRUSTED_PROXIES config) and if and only if that
check passes, extract peer_host =
(request.headers.get("X-Forwarded-For","").split(",")[0].strip() or
request.headers.get("X-Real-IP") ) else set peer_host = request.remote or
"unknown"; then construct peername = Address(peer_host, 0). Ensure you
validate/normalize the forwarded value (take first IP only) and fall back to
request.remote when the proxy check fails.
---
Nitpick comments:
In `@tests/integration_tests/test_server_instance.py`:
- Around line 37-40: Update the module-level triple-quoted docstring that
describes ServerInstance hosting multiple ServerContext instances so it follows
docstring conventions: make the first line a one-sentence summary ending with a
period, add a single blank line after that summary, and put the longer
explanation on subsequent line(s); this will resolve the D205/D212/D415 style
nits for that docstring.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6c3efa49-9150-41ad-96fb-b545aaf630c1
📒 Files selected for processing (15)
Pipfilemain.pyminikube-example.yamlserver/__init__.pyserver/config.pyserver/protocol/__init__.pyserver/protocol/websocket.pyserver/servercontext.pytests/integration_tests/conftest.pytests/integration_tests/test_load.pytests/integration_tests/test_server.pytests/integration_tests/test_server_instance.pytests/integration_tests/test_servercontext.pytests/unit_tests/test_servercontext.pytests/unit_tests/test_websocket_protocol.py
💤 Files with no reviewable changes (3)
- Pipfile
- tests/integration_tests/test_load.py
- tests/integration_tests/test_server.py
- Remove unused aiohttp.web import and unused get_session test import (flake8 F401). - Regenerate Pipfile.lock after dropping proxy-protocol. - WebSocketProtocol.write_messages no longer increments sent_messages itself — write_raw already does it per message. - ServerContext only honors a forwarded-IP header when the new WS_FORWARDED_IP_HEADER config is set (e.g. "CF-Connecting-IP"); otherwise uses the direct TCP peer, so clients cannot spoof their peername by sending X-Forwarded-For themselves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
server/servercontext.py (1)
161-164:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
X-Forwarded-Forhandling is still spoofable when this option is enabled.Taking the first value from a comma-separated forwarded chain lets clients forge peer IPs in common proxy setups. This should only accept proxy-authenticated headers (or parse trusted hops explicitly) and validate the extracted value as an IP before using it as
peername.Suggested direction
+import ipaddress ... - if header_name: - forwarded = request.headers.get(header_name, "") - peer_host = forwarded.split(",")[0].strip() or None + if header_name: + raw = request.headers.get(header_name, "") + candidate = raw.strip() + # If you support X-Forwarded-For, parse with trusted-proxy semantics + # instead of taking the first hop. + if candidate: + try: + ipaddress.ip_address(candidate) + peer_host = candidate + except ValueError: + peer_host = None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/servercontext.py` around lines 161 - 164, The code that extracts the peer IP from the X-Forwarded-For header (when WS_FORWARDED_IP_HEADER is configured) does not validate that the extracted value is actually a valid IP address before using it as peer_host. This allows clients to forge arbitrary values in the header. Add IP address validation (checking that the stripped forwarded value is a valid IPv4 or IPv6 address) after extracting it from the comma-separated header string and before assigning it to peer_host. If validation fails, set peer_host to None to reject the spoofed value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@server/servercontext.py`:
- Around line 161-164: The code that extracts the peer IP from the
X-Forwarded-For header (when WS_FORWARDED_IP_HEADER is configured) does not
validate that the extracted value is actually a valid IP address before using it
as peer_host. This allows clients to forge arbitrary values in the header. Add
IP address validation (checking that the stripped forwarded value is a valid
IPv4 or IPv6 address) after extracting it from the comma-separated header string
and before assigning it to peer_host. If validation fails, set peer_host to None
to reject the spoofed value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fdd4aafc-937e-49dc-a88a-9746a8a4d1cb
⛔ Files ignored due to path filters (1)
Pipfile.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
server/config.pyserver/protocol/websocket.pyserver/servercontext.pytests/integration_tests/test_server.py
💤 Files with no reviewable changes (1)
- tests/integration_tests/test_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
- server/protocol/websocket.py
- websocket.py module docstring uses D212 style (summary on line 2). - Add one-line __init__ docstring (D107). - test_multiple_contexts docstring has a one-line summary ending with a period (D205/D212/D415). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codacy enforces D212 on modules (summary on first line) and D213 on functions (summary on second line) — I had them reversed in the previous commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/protocol/websocket.py (1)
47-50:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMove metrics increment below the connectivity guard.
Line 48 increments
sent_messagesbefore Line 49 rejects disconnected writes, so failed sends are still counted as sent.Suggested patch
def write_raw(self, data: bytes) -> None: - metrics.sent_messages.labels(self.__class__.__name__).inc() if not self.is_connected(): raise DisconnectedError("Protocol is not connected!") + metrics.sent_messages.labels(self.__class__.__name__).inc() text = data.decode() if isinstance(data, (bytes, bytearray)) else data task = asyncio.create_task(self.ws.send_str(text))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/protocol/websocket.py` around lines 47 - 50, Move the sent_messages metric increment in websocket.py’s write_raw method below the is_connected guard so disconnected writes are not counted as sent; keep the connectivity check first, then increment metrics.sent_messages and proceed with the actual send path in write_raw.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@server/protocol/websocket.py`:
- Around line 47-50: Move the sent_messages metric increment in websocket.py’s
write_raw method below the is_connected guard so disconnected writes are not
counted as sent; keep the connectivity check first, then increment
metrics.sent_messages and proceed with the actual send path in write_raw.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f1857e26-05e1-4322-a4df-8821cbf23ffc
📒 Files selected for processing (2)
server/protocol/websocket.pytests/integration_tests/test_server_instance.py
Codacy has both D212 and D213 enabled, which is mutually contradictory for multi-line docstrings — whichever style we pick, the other fires. Single-line docstrings dodge both rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloudflare (and our Traefik ingress) sets X-Real-IP to the real client address, so it's the right default. Still configurable to "" for deployments where the server is exposed to untrusted clients directly (spoofable headers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- WebSocketProtocol now reaches 100% line coverage in isolation: added tests for binary frames, write_message/write_messages paths, empty-pending drain, drain-failure -> DisconnectedError, and abort with/without an already-closed ws. - SimpleJsonProtocol regains coverage of decode_message / read_message / empty-readline DisconnectedError; these were previously hit through the qstream-vs-json integration parametrize that this PR removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit_tests/test_websocket_protocol.py`:
- Around line 147-168: The assertion ws.close.assert_called() in the
test_abort_cancels_pending_and_closes_ws function only verifies that the close
method was called, but does not confirm the coroutine was actually awaited.
Since ws.close is an AsyncMock, replace the assertion with
ws.close.assert_awaited() to properly verify that the close operation was fully
executed and completed, not just scheduled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 224cf4df-7069-423c-8c7b-c44b8f4d77be
📒 Files selected for processing (5)
server/config.pyserver/protocol/websocket.pytests/integration_tests/test_server_instance.pytests/unit_tests/test_protocol.pytests/unit_tests/test_websocket_protocol.py
🚧 Files skipped from review as they are similar to previous changes (1)
- server/protocol/websocket.py
The Java client (faf-commons-lobby) connects to wss://lobby.{base}
with no path component, so the WS upgrade GET hits '/'. Serving on
'/ws' caused aiohttp to fail routing and return 'Invalid method
encountered' (the parser confusing itself on the unmatched URL).
Switch the default WS_PATH and the path= function defaults to '/'
so out of the box the server matches what the client sends. Existing
deployments can still pin WS_PATH to '/ws' (or anything else) via
config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
faf-commons-lobby (the Kotlin lobby client) splits the inbound WS byte stream on '\n' rather than treating each frame as a complete message. Without a trailing newline the client buffers the response indefinitely and times out the login flow after 30 s. The previous SimpleJsonProtocol already encoded messages this way, which is why it worked through ws_bridge_rs unchanged. Mirror that behavior in WebSocketProtocol so existing clients keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit nit: assert_called only confirms the close coroutine was scheduled, not that it actually ran to completion. assert_awaited is the right check for an AsyncMock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I can't test a game (because setup on mac is not working) but tested the broadcast message, the websocket connection works. |
compatibility with vibecoded FAForever/server#1093
Summary
:8003/, one JSON object per text frame,\n-terminated for client framing compatibility) instead of raw TCP behind thews_bridge_rssidecar.LISTENconfig. New config:WS_HOST(default"")WS_PORT(default8003)WS_PATH(default"/"— matches whatfaf-commons-lobbyconnects to)WS_FORWARDED_IP_HEADER(default"X-Real-IP"; set to""to opt out, or to"CF-Connecting-IP"to use Cloudflare's header)QDataStreamProtocol/SimpleJsonProtocolmodules are kept (encoding helpers / unit tests still use them) but are no longer wired into a listener.ws_bridge_rs→ TCP hop was breaking client reconnect, and operationally we can't expose raw TCP anymore (DDoS). Native WS keeps everything on HTTPS behind Traefik / Cloudflare.Client compatibility
faf-commons-lobby(used by the Java client) connects towss://lobby.{base}/and splits the inbound WS byte stream on\nrather than treating each frame as a message. So:/(a/wsdefault caused aiohttp to fail routing on the upgrade).\nto match what the client splits on. Mirrors howSimpleJsonProtocolframed messages before — the bridge passed those bytes through unchanged, which is why it worked before.Forwarded-IP header
Only honored when
WS_FORWARDED_IP_HEADERis explicitly set; otherwise the direct TCP peer is used. Stops direct clients from spoofing their peername by sendingX-Forwarded-Forthemselves.Follow-up (gitops-stack)
apps/faf-ws-bridge/.lobby.{baseDomain}(andws.{baseDomain}for back-compat) directly atfaf-lobby-server:8003.WS_FORWARDED_IP_HEADER: CF-Connecting-IPin the lobby config so Cloudflare's client IP is preserved.Test plan
pipenv run pytest tests/unit_tests/— 692 pass (1 pre-existing DB-schema deselect, unrelated)pipenv run pytest tests/integration_tests/ -m 'not rabbitmq'— 154 passWebSocketProtocolunit tests at 100% line coverage in isolation (binary frames, write_message/write_messages paths, drain edge cases, abort)aiohttp.ClientSession().ws_connect(), send/receive round-tripwss://lobby.faforever.xyzagainst the test deploy🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Removed Features
Configuration
Infrastructure