hotpatch: Fix/db startup index lock hang#1502
Conversation
A single abandoned "idle in transaction" session held locks on the documents table, which blocked the non-concurrent CREATE INDEX (hnsw) run inside the FastAPI lifespan. Each API restart queued another CREATE INDEX behind an advisory lock, leaving the server stuck at "Waiting for application startup." indefinitely and freezing ingestion writes. Changes: - setup_indexes(): build every index with CREATE INDEX CONCURRENTLY (non-blocking ShareUpdateExclusiveLock) under a per-session lock_timeout, and make each statement non-fatal so a contended/slow build is retried next boot instead of wedging startup. Drop leftover invalid indexes before rebuilding. - create_db_and_tables(): apply lock_timeout to extension/create_all DDL and gate the whole bootstrap behind DB_BOOTSTRAP_ON_STARTUP. - engine: set idle_in_transaction_session_timeout (asyncpg) so an abandoned transaction is reaped automatically. - config + .env.example: DB_BOOTSTRAP_ON_STARTUP, DB_DDL_LOCK_TIMEOUT_MS, DB_IDLE_IN_TX_TIMEOUT_MS. Co-authored-by: Cursor <cursoragent@cursor.com>
The long-running ingestion/podcast/video tasks run on a separate Celery engine (NullPool), so the web engine's idle_in_transaction_session_timeout did not cover them — which is exactly where the original 11h zombie (INSERT INTO chunks) came from. Apply the same protection to the Celery engine with a generous 60-minute default so a worker that hangs/crashes mid-transaction can't hold locks on documents/chunks indefinitely, while never reaping a legitimate per-document embed window. - config + .env.example: DB_CELERY_IDLE_IN_TX_TIMEOUT_MS (default 3600000). Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds four optional config knobs ( ChangesDB Safety Knobs and Startup Bootstrap
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Description
Motivation and Context
FIX #
Screenshots
API Changes
Change Type
Testing Performed
Checklist
High-level PR Summary
This PR fixes a critical database startup hang issue where abandoned "idle in transaction" sessions could indefinitely hold locks and block FastAPI application startup. The fix introduces protective timeouts (
idle_in_transaction_session_timeoutandlock_timeout) for both web and Celery worker database connections, refactors index creation to use CONCURRENTLY mode (non-blocking ShareUpdateExclusiveLock), adds graceful handling of invalid leftover indexes, and provides configuration knobs (DB_BOOTSTRAP_ON_STARTUP,DB_DDL_LOCK_TIMEOUT_MS) to control bootstrap behavior and ensure fast-fail instead of indefinite hangs.⏱️ Estimated Review Time: 30-90 minutes
💡 Review Order Suggestion
surfsense_backend/.env.examplesurfsense_backend/app/config/__init__.pysurfsense_backend/app/db.pysurfsense_backend/app/tasks/celery_tasks/__init__.pySummary by CodeRabbit
New Features
Bug Fixes & Improvements