Fix edge case with emtpy bucket results#672
Merged
Merged
Conversation
Sleepful
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
There's an edge case with V3 multi-op in the V3 multi-op document read path, when all operations are filtered out in memory:
getBucketDataBatch()stops reading and emitted neither data nor ahas_moresignal, so buckets behindthe batch boundary were silently treated as fully synced. Clients would be missing operations and fail checksum validation.
The fix makes storage report such buckets as complete via empty chunks, and makes the sync stream advance bucket positions for empty chunks. The existing caller loop (
BucketChecksumState/sync.ts) then re-requests the remaining buckets and reaches the data.The bug (written by Claude)
With chunked multi-op documents, the read query can only filter on document-level metadata (
min_opand_id.o); it cannot see individual operations. A document can therefore match the query while none of its operations are in the requested window.Concrete example. Op ids come from one global sequence, so a bucket's ops are sparse. Suppose bucket A has ops 10, 20 and 100 stored in a single document:
A client synced through op 20 requests
(start = 20, end = 50]:_id.o > 20✓ (100),min_op <= 50✓ (10) → document fetched.<= start, 100 is> end→ zero rows.That result is correct for bucket A — it has no data in the window. The problem was the handling: if every document in the first server batch (~101 documents) was such a "straddler" (one per bucket; plausible when a request covers 100+ buckets mid-reconnect with writes racing past the checkpoint), the code hit
if (data.length == 0) continue;and abandoned the cursor. Documents behind the batch boundary — containing real data for other buckets — were never read, and nohas_morewas emitted for them.The fix
Two small changes that reuse the existing caller retry loop instead of adding a second read loop inside storage:
1.
MongoSyncBucketStorageV3.getBucketDataBatchImplWhen a matched document contributes zero rows, its bucket is provably complete through the checkpoint: any op
<= endin that document would also be> start(the query guarantees_id.o > start), and since per-bucket document ranges are disjoint, no later document for that bucket can match either.Storage now yields an empty chunk for each such bucket (
data: [], next_after = checkpoint, has_more: false), so the caller can retire it. If the batch produced no data at all, the last empty chunk carrieshas_more: true(mirroring the existing last-chunk convention), so the caller re-requests the remaining buckets instead of treating the stream as complete. Progress is guaranteed: each round retires every straddler bucket in the batch, so the next request no longer matches their documents.2.
sync.ts—bucketDataBatchupdateBucketPosition()is now called for empty chunks before they are skipped. Previously empty chunks were dropped before the position update, which would have re-requested the same buckets with the same positions indefinitely. Empty chunks are still never sent to clients — the protocol output is unchanged.This also fixes a pre-existing inefficiency in the mixed case (some buckets straddle, others have data): straddler buckets previously never advanced their positions, so every subsequent round re-fetched their straddling documents.
Semantics note
This slightly extends the implicit
SyncBucketDataChunkcontract: an empty chunk is a progress marker ("this bucket has no operations in the requested range; advance tonext_after"). Other storage implementations never emit empty chunks and are unaffected.AI Usage
Used Claude Fable 5 to find the bug. Manually suggested the fix, and had Claude implement it.