Per-IP rate limiting on the public read surface (anti-flood)#266
Merged
Conversation
The unauthenticated reads (active-challenges, CNF fetch, leaderboard) had no per-IP limit. Under a flood — a miner tight-polling the whole board, a scraper, or abuse — the single-instance worker pool wedges on the db_write_lock / file-backed CNF reads, the board hangs, and miners see it as "no new challenges". Railway/Cloudflare also bill the wasted capacity. Add an ASGI token-bucket middleware evaluated at the front door (outermost, before CORS/routes/DB), so an abusive IP gets a cheap 429 with Retry-After instead of consuming a DB connection. Per-IP buckets, burst capacity refilled at a steady rps, bounded/swept memory, fail-closed when the table is full (spoofed-IP flood). Client IP resolved from Railway's x-envoy-external-address (unspoofable peer), then X-Forwarded-For, then the socket peer. /health is exempt so monitoring keeps working; OPTIONS passes for CORS preflight. Disabled by default (CATHEDRAL_IP_RATE_LIMIT_ENABLED) so tests/local are unaffected; production sets it plus optional RPS/BURST/ALLOWLIST tuning. This is the in-app backstop; an edge proxy (Cloudflare) is the durable layer for spoofed-XFF floods.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Live incident:
api.cathedral.computer/api/cathedral/*routes hang (HTTP 000 / timeout) while the Railway edge answers/instantly — the single-instance worker pool is wedged by load on the unauthenticated reads (active-challenges, CNF fetch,leaderboard). Those routes had no per-IP limit (the existingrate_limit.pyonly guards authenticated submits per-hotkey). Miners see the board hang as "no new challenges"; Railway/Cloudflare bill the wasted capacity.What
New ASGI token-bucket middleware (
ip_rate_limit.py), mounted outermost (after CORS so it wraps it) so an abusive IP gets a cheap 429 + Retry-After before any route handler, CORS, or DB access runs — sheds load instead of consuming a DB connection / thedb_write_lock.burstcapacity refilled atrpstokens/sec (allows legit bursts, caps sustained abuse).x-envoy-external-address(Railway's unspoofable peer) → left-mostX-Forwarded-For→ socket peer./healthexempt (monitoring keeps working);OPTIONSpasses (CORS preflight).CATHEDRAL_IP_RATE_LIMIT_ENABLED) so tests/local are unaffected.Deploy / tuning (env)
CATHEDRAL_IP_RATE_LIMIT_ENABLEDtrueto enableCATHEDRAL_IP_RATE_LIMIT_RPSCATHEDRAL_IP_RATE_LIMIT_BURSTCATHEDRAL_IP_RATE_LIMIT_ALLOWLISTCATHEDRAL_IP_RATE_LIMIT_EXEMPT_PATHS/health,/api/cathedral/healthCATHEDRAL_IP_RATE_LIMIT_TRUSTED_HEADERx-envoy-external-addressCATHEDRAL_IP_RATE_LIMIT_MAX_TRACKEDScope / safety
{"detail": ...}.Note
In-app backstop. The durable layer for spoofed-XFF floods is an edge proxy (turn on the Cloudflare orange-cloud + a rate-limit rule in front of Railway). Merging #265 (lock-free CNF serve) removes the underlying per-request lock contention that lets a flood wedge workers in the first place — this PR + #265 are complementary.