fix(indexing): batch chunk inserts and truncate notification titles#1508
Conversation
|
@CREDO23 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
persist_scratch_index) instead of one monolithic commit, with[PERF] chunk batch/chunk persistlogging andINDEXING_CHUNK_INSERT_BATCH_SIZE(default 200).varchar(200)and continue indexing if notification creation fails (long filenames no longer leave docs stuckpending).Test plan
pytest tests/unit/indexing_pipeline/test_persist_scratch_index.py tests/unit/notifications/pytest tests/integration/notifications/test_document_processing_handler.py tests/integration/indexing_pipeline/test_index_document.py[PERF]showschunk persistin seconds–tens of seconds andindex TOTAL − chunk+embedis no longer minutesHigh-level PR Summary
This PR improves the document indexing pipeline by batching chunk inserts during scratch reindexing (controlled by
INDEXING_CHUNK_INSERT_BATCH_SIZEconfig, default 200) and adds performance logging for chunk persistence. It also fixes a bug where documents with very long filenames would fail to create notifications due to exceeding thevarchar(200)title limit, by introducing aformat_titlehelper that truncates document names while preserving them in metadata. Additionally, notification creation failures are now caught and logged as warnings instead of blocking the entire indexing process.⏱️ Estimated Review Time: 15-30 minutes
💡 Review Order Suggestion
surfsense_backend/app/config/__init__.pysurfsense_backend/app/notifications/constants.pysurfsense_backend/app/notifications/service/messages/text.pysurfsense_backend/tests/unit/notifications/service/messages/test_text.pysurfsense_backend/app/notifications/service/messages/document_processing.pysurfsense_backend/tests/unit/notifications/service/messages/test_document_processing.pysurfsense_backend/app/notifications/service/handlers/document_processing.pysurfsense_backend/tests/integration/notifications/test_document_processing_handler.pysurfsense_backend/app/indexing_pipeline/document_persistence.pysurfsense_backend/tests/unit/indexing_pipeline/test_persist_scratch_index.pysurfsense_backend/app/indexing_pipeline/indexing_pipeline_service.pysurfsense_backend/app/tasks/celery_tasks/document_tasks.pysurfsense_backend/tests/unit/indexing_pipeline/test_index_batch_parallel.py