Skip to content

fix(indexing): batch chunk inserts and truncate notification titles#1508

Merged
MODSetter merged 13 commits into
MODSetter:devfrom
CREDO23:fix/indexing-batch-chunk-insert
Jun 17, 2026
Merged

fix(indexing): batch chunk inserts and truncate notification titles#1508
MODSetter merged 13 commits into
MODSetter:devfrom
CREDO23:fix/indexing-batch-chunk-insert

Conversation

@CREDO23

@CREDO23 CREDO23 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Batch chunk row inserts during scratch reindex (persist_scratch_index) instead of one monolithic commit, with [PERF] chunk batch / chunk persist logging and INDEXING_CHUNK_INSERT_BATCH_SIZE (default 200).
  • Truncate document-processing notification titles to fit varchar(200) and continue indexing if notification creation fails (long filenames no longer leave docs stuck pending).

Test plan

  • Unit tests: pytest tests/unit/indexing_pipeline/test_persist_scratch_index.py tests/unit/notifications/
  • Integration: pytest tests/integration/notifications/test_document_processing_handler.py tests/integration/indexing_pipeline/test_index_document.py
  • After deploy to prod/staging: re-upload a ~500kb log txt and confirm Axiom [PERF] shows chunk persist in seconds–tens of seconds and index TOTAL − chunk+embed is no longer minutes

High-level PR Summary

This PR improves the document indexing pipeline by batching chunk inserts during scratch reindexing (controlled by INDEXING_CHUNK_INSERT_BATCH_SIZE config, default 200) and adds performance logging for chunk persistence. It also fixes a bug where documents with very long filenames would fail to create notifications due to exceeding the varchar(200) title limit, by introducing a format_title helper that truncates document names while preserving them in metadata. Additionally, notification creation failures are now caught and logged as warnings instead of blocking the entire indexing process.

⏱️ Estimated Review Time: 15-30 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/config/__init__.py
2 surfsense_backend/app/notifications/constants.py
3 surfsense_backend/app/notifications/service/messages/text.py
4 surfsense_backend/tests/unit/notifications/service/messages/test_text.py
5 surfsense_backend/app/notifications/service/messages/document_processing.py
6 surfsense_backend/tests/unit/notifications/service/messages/test_document_processing.py
7 surfsense_backend/app/notifications/service/handlers/document_processing.py
8 surfsense_backend/tests/integration/notifications/test_document_processing_handler.py
9 surfsense_backend/app/indexing_pipeline/document_persistence.py
10 surfsense_backend/tests/unit/indexing_pipeline/test_persist_scratch_index.py
11 surfsense_backend/app/indexing_pipeline/indexing_pipeline_service.py
12 surfsense_backend/app/tasks/celery_tasks/document_tasks.py
13 surfsense_backend/tests/unit/indexing_pipeline/test_index_batch_parallel.py

Need help? Join our Discord

@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

@CREDO23 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d0ed65c6-e7f6-41f2-a90e-ee5b2c5cb390

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CREDO23 CREDO23 requested a review from MODSetter June 17, 2026 13:40
@MODSetter MODSetter merged commit 6a45f24 into MODSetter:dev Jun 17, 2026
5 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants