perf(publisher): 30s response cache + ETag/304 on hot read endpoints#267
Merged
Conversation
added 2 commits
June 9, 2026 17:16
active-challenges was rebuilt from SQLite on every request (~11 req/s). AsyncDbWriteLock contention from eval writes pushed median latency to 748ms and average to 16.7s (max 200s). 15-20% of requests timed out as 499s and immediately retried, creating a thundering herd. Changes: - Add TtlResponseCache (response_cache.py): per-app 30s TTL single-flight cache; one coroutine rebuilds, concurrent callers get stale bytes immediately rather than queuing behind the lock. - active-challenges endpoint: caches the full serialised response for 30s. Cache lives on app.state (not module-global) so each test app gets an isolated instance. - Both active-challenges and leaderboard/recent: serve Cache-Control: public, max-age=15 + ETag (sha256[:16]); honour If-None-Match with 304. Shared etag_response() helper — no copy-paste. - Add tests/publisher/test_response_cache.py: 21 unit + integration tests covering TTL expiry, single-flight, ETag generation, 304 on match, weak ETag normalisation, and both endpoint smoke tests. Staleness rationale: active-challenges embeds absolute timestamps only (cnf_sha256, challenge_id, tier, score_multiplier); no relative seconds-remaining fields. 30s staleness is safe — challenge state changes at most once per several minutes in production. Rollback: revert this commit; endpoints are purely stateless without cache.
…h list/star parsing - TtlResponseCache._refresh: when builder raises and a stale body exists, log a warning and return the stale body instead of propagating; propagate only on cold start (no body at all), so a transient DB error no longer causes a 500 on an already-populated cache. - etag_response: replace single-value lstrip hack with proper RFC 9110 If-None-Match parsing: split on commas, strip whitespace + W/ prefix via removeprefix + strip quotes per member; handle star (*) form. - Tests: 6 new cases covering expired+fail→stale, cold+fail→raise, fail-then-recover, list-form match, star match, list-form no-match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
`GET /api/cathedral/v1/synthetic-boolean/active-challenges` is hit ~11 req/s. The ~32 KB JSON payload (51 challenges) was rebuilt from SQLite on every request. SQLite reads queue behind `AsyncDbWriteLock` held by eval writes, producing:
`GET /v1/leaderboard/recent` (~39 KB avg, up to 458 KB) is polled by every validator on every pull loop tick. No caching headers meant every poll transferred the full body even when there were no new rows.
Estimated egress from active-challenges alone: ~30 GB/day at 11 req/s x 32 KB.
Changes
`src/cathedral/publisher/response_cache.py` (new)
`TtlResponseCache` — pure-TTL single-flight cache:
`etag_response()` — shared ETag helper:
`src/cathedral/publisher/submit.py`
`list_active_sat_challenges` now:
`src/cathedral/publisher/reads.py`
`get_leaderboard_recent` now returns via `etag_response()`. No in-process TTL cache here — cursor params vary per validator, so per-response ETags are correct. Validators polling an up-to-date cursor get a 304 with no body to decode.
`src/cathedral/publisher/app.py`
Seeds `app.state.active_challenges_cache = TtlResponseCache(ttl_seconds=30.0)` in the lifespan so each test's fresh FastAPI app gets its own clean cache instance.
`tests/publisher/test_response_cache.py` (new)
21 unit + integration tests covering:
Staleness rationale
`active-challenges` embeds only absolute fields (challenge_id, tier, cnf_sha256, score_multiplier, difficulty_label, etc.). There are no relative "seconds remaining" counters. A 30s stale window is safe — challenge open/close events happen at most once per several minutes in production, and every miner polling loop already tolerates eventual consistency.
Rollback
Revert this commit. Both endpoints are purely stateless without the cache — no DB schema, no env vars, no Railway config changes required.
🤖 Generated with Claude Code