Skip to content

Per-IP rate limiting on the public read surface (anti-flood)#266

Merged
wallscaler merged 2 commits into
mainfrom
fix/publisher-ip-rate-limit
Jun 9, 2026
Merged

Per-IP rate limiting on the public read surface (anti-flood)#266
wallscaler merged 2 commits into
mainfrom
fix/publisher-ip-rate-limit

Conversation

@wallscaler

Copy link
Copy Markdown
Contributor

Why

Live incident: api.cathedral.computer/api/cathedral/* routes hang (HTTP 000 / timeout) while the Railway edge answers / instantly — the single-instance worker pool is wedged by load on the unauthenticated reads (active-challenges, CNF fetch, leaderboard). Those routes had no per-IP limit (the existing rate_limit.py only guards authenticated submits per-hotkey). Miners see the board hang as "no new challenges"; Railway/Cloudflare bill the wasted capacity.

What

New ASGI token-bucket middleware (ip_rate_limit.py), mounted outermost (after CORS so it wraps it) so an abusive IP gets a cheap 429 + Retry-After before any route handler, CORS, or DB access runs — sheds load instead of consuming a DB connection / the db_write_lock.

  • Per-IP token bucket: burst capacity refilled at rps tokens/sec (allows legit bursts, caps sustained abuse).
  • Client IP: x-envoy-external-address (Railway's unspoofable peer) → left-most X-Forwarded-For → socket peer.
  • Bounded memory: idle buckets swept every ~30s; hard table cap → fail-closed for new IPs under a spoofed-IP flood.
  • /health exempt (monitoring keeps working); OPTIONS passes (CORS preflight).
  • Disabled by default (CATHEDRAL_IP_RATE_LIMIT_ENABLED) so tests/local are unaffected.

Deploy / tuning (env)

var default note
CATHEDRAL_IP_RATE_LIMIT_ENABLED off set true to enable
CATHEDRAL_IP_RATE_LIMIT_RPS 10 sustained req/s per IP
CATHEDRAL_IP_RATE_LIMIT_BURST 120 bucket capacity
CATHEDRAL_IP_RATE_LIMIT_ALLOWLIST "" CSV of exempt IPs
CATHEDRAL_IP_RATE_LIMIT_EXEMPT_PATHS /health,/api/cathedral/health path prefixes
CATHEDRAL_IP_RATE_LIMIT_TRUSTED_HEADER x-envoy-external-address trusted peer header
CATHEDRAL_IP_RATE_LIMIT_MAX_TRACKED 100000 bucket-table cap

Scope / safety

  • Publisher-only; no validator change.
  • No wire/contract change; 429 body is the standard {"detail": ...}.
  • 17 unit tests drive the ASGI interface with an injected clock (deterministic bucket refill): burst→429, refill-over-time, per-IP isolation, exempt path, OPTIONS, allowlist, trusted-header-over-spoofed-XFF, full-table fail-closed, disabled-by-default.

Note

In-app backstop. The durable layer for spoofed-XFF floods is an edge proxy (turn on the Cloudflare orange-cloud + a rate-limit rule in front of Railway). Merging #265 (lock-free CNF serve) removes the underlying per-request lock contention that lets a flood wedge workers in the first place — this PR + #265 are complementary.

wallscaler and others added 2 commits June 9, 2026 12:01
The unauthenticated reads (active-challenges, CNF fetch, leaderboard) had
no per-IP limit. Under a flood — a miner tight-polling the whole board, a
scraper, or abuse — the single-instance worker pool wedges on the
db_write_lock / file-backed CNF reads, the board hangs, and miners see it
as "no new challenges". Railway/Cloudflare also bill the wasted capacity.

Add an ASGI token-bucket middleware evaluated at the front door (outermost,
before CORS/routes/DB), so an abusive IP gets a cheap 429 with Retry-After
instead of consuming a DB connection. Per-IP buckets, burst capacity
refilled at a steady rps, bounded/swept memory, fail-closed when the table
is full (spoofed-IP flood). Client IP resolved from Railway's
x-envoy-external-address (unspoofable peer), then X-Forwarded-For, then the
socket peer. /health is exempt so monitoring keeps working; OPTIONS passes
for CORS preflight.

Disabled by default (CATHEDRAL_IP_RATE_LIMIT_ENABLED) so tests/local are
unaffected; production sets it plus optional RPS/BURST/ALLOWLIST tuning.
This is the in-app backstop; an edge proxy (Cloudflare) is the durable
layer for spoofed-XFF floods.
@wallscaler wallscaler merged commit 8595c1b into main Jun 9, 2026
2 checks passed
@wallscaler wallscaler deleted the fix/publisher-ip-rate-limit branch June 9, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant