Skip to content

Fix edge case with emtpy bucket results#672

Merged
rkistner merged 1 commit into
compressed-bucket-storagefrom
fix-empty-bucket-results
Jun 12, 2026
Merged

Fix edge case with emtpy bucket results#672
rkistner merged 1 commit into
compressed-bucket-storagefrom
fix-empty-bucket-results

Conversation

@rkistner

Copy link
Copy Markdown
Contributor

Summary

There's an edge case with V3 multi-op in the V3 multi-op document read path, when all operations are filtered out in memory:getBucketDataBatch() stops reading and emitted neither data nor a has_more signal, so buckets behind
the batch boundary were silently treated as fully synced. Clients would be missing operations and fail checksum validation.

The fix makes storage report such buckets as complete via empty chunks, and makes the sync stream advance bucket positions for empty chunks. The existing caller loop (BucketChecksumState / sync.ts) then re-requests the remaining buckets and reaches the data.

The bug (written by Claude)

With chunked multi-op documents, the read query can only filter on document-level metadata (min_op and _id.o); it cannot see individual operations. A document can therefore match the query while none of its operations are in the requested window.

Concrete example. Op ids come from one global sequence, so a bucket's ops are sparse. Suppose bucket A has ops 10, 20 and 100 stored in a single document:

document: min_op = 10, _id.o = 100, ops = [10, 20, 100]

A client synced through op 20 requests (start = 20, end = 50]:

  • Query: _id.o > 20 ✓ (100), min_op <= 50 ✓ (10) → document fetched.
  • Post-filter: 10 and 20 are <= start, 100 is > endzero rows.

That result is correct for bucket A — it has no data in the window. The problem was the handling: if every document in the first server batch (~101 documents) was such a "straddler" (one per bucket; plausible when a request covers 100+ buckets mid-reconnect with writes racing past the checkpoint), the code hit if (data.length == 0) continue; and abandoned the cursor. Documents behind the batch boundary — containing real data for other buckets — were never read, and no has_more was emitted for them.

The fix

Two small changes that reuse the existing caller retry loop instead of adding a second read loop inside storage:

1. MongoSyncBucketStorageV3.getBucketDataBatchImpl

When a matched document contributes zero rows, its bucket is provably complete through the checkpoint: any op <= end in that document would also be > start (the query guarantees _id.o > start), and since per-bucket document ranges are disjoint, no later document for that bucket can match either.

Storage now yields an empty chunk for each such bucket (data: [], next_after = checkpoint, has_more: false), so the caller can retire it. If the batch produced no data at all, the last empty chunk carries has_more: true (mirroring the existing last-chunk convention), so the caller re-requests the remaining buckets instead of treating the stream as complete. Progress is guaranteed: each round retires every straddler bucket in the batch, so the next request no longer matches their documents.

2. sync.tsbucketDataBatch

updateBucketPosition() is now called for empty chunks before they are skipped. Previously empty chunks were dropped before the position update, which would have re-requested the same buckets with the same positions indefinitely. Empty chunks are still never sent to clients — the protocol output is unchanged.

This also fixes a pre-existing inefficiency in the mixed case (some buckets straddle, others have data): straddler buckets previously never advanced their positions, so every subsequent round re-fetched their straddling documents.

Semantics note

This slightly extends the implicit SyncBucketDataChunk contract: an empty chunk is a progress marker ("this bucket has no operations in the requested range; advance to next_after"). Other storage implementations never emit empty chunks and are unaffected.

AI Usage

Used Claude Fable 5 to find the bug. Manually suggested the fix, and had Claude implement it.

@Sleepful Sleepful left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix LGTM

@rkistner rkistner merged commit 3493dc4 into compressed-bucket-storage Jun 12, 2026
43 checks passed
@rkistner rkistner deleted the fix-empty-bucket-results branch June 12, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants