Fix edge case with emtpy bucket results by rkistner · Pull Request #672 · powersync-ja/powersync-service

rkistner · 2026-06-11T09:34:47Z

Summary

There's an edge case with V3 multi-op in the V3 multi-op document read path, when all operations are filtered out in memory:getBucketDataBatch() stops reading and emitted neither data nor a has_more signal, so buckets behind
the batch boundary were silently treated as fully synced. Clients would be missing operations and fail checksum validation.

The fix makes storage report such buckets as complete via empty chunks, and makes the sync stream advance bucket positions for empty chunks. The existing caller loop (BucketChecksumState / sync.ts) then re-requests the remaining buckets and reaches the data.

The bug (written by Claude)

With chunked multi-op documents, the read query can only filter on document-level metadata (min_op and _id.o); it cannot see individual operations. A document can therefore match the query while none of its operations are in the requested window.

Concrete example. Op ids come from one global sequence, so a bucket's ops are sparse. Suppose bucket A has ops 10, 20 and 100 stored in a single document:

document: min_op = 10, _id.o = 100, ops = [10, 20, 100]

A client synced through op 20 requests (start = 20, end = 50]:

Query: _id.o > 20 ✓ (100), min_op <= 50 ✓ (10) → document fetched.
Post-filter: 10 and 20 are <= start, 100 is > end → zero rows.

That result is correct for bucket A — it has no data in the window. The problem was the handling: if every document in the first server batch (~101 documents) was such a "straddler" (one per bucket; plausible when a request covers 100+ buckets mid-reconnect with writes racing past the checkpoint), the code hit if (data.length == 0) continue; and abandoned the cursor. Documents behind the batch boundary — containing real data for other buckets — were never read, and no has_more was emitted for them.

The fix

Two small changes that reuse the existing caller retry loop instead of adding a second read loop inside storage:

1. `MongoSyncBucketStorageV3.getBucketDataBatchImpl`

When a matched document contributes zero rows, its bucket is provably complete through the checkpoint: any op <= end in that document would also be > start (the query guarantees _id.o > start), and since per-bucket document ranges are disjoint, no later document for that bucket can match either.

Storage now yields an empty chunk for each such bucket (data: [], next_after = checkpoint, has_more: false), so the caller can retire it. If the batch produced no data at all, the last empty chunk carries has_more: true (mirroring the existing last-chunk convention), so the caller re-requests the remaining buckets instead of treating the stream as complete. Progress is guaranteed: each round retires every straddler bucket in the batch, so the next request no longer matches their documents.

2. `sync.ts` — `bucketDataBatch`

updateBucketPosition() is now called for empty chunks before they are skipped. Previously empty chunks were dropped before the position update, which would have re-requested the same buckets with the same positions indefinitely. Empty chunks are still never sent to clients — the protocol output is unchanged.

This also fixes a pre-existing inefficiency in the mixed case (some buckets straddle, others have data): straddler buckets previously never advanced their positions, so every subsequent round re-fetched their straddling documents.

Semantics note

This slightly extends the implicit SyncBucketDataChunk contract: an empty chunk is a progress marker ("this bucket has no operations in the requested range; advance to next_after"). Other storage implementations never emit empty chunks and are unaffected.

AI Usage

Used Claude Fable 5 to find the bug. Manually suggested the fix, and had Claude implement it.

Sleepful

Fix LGTM

Fix edge case with emtpy bucket results.

5c4548f

rkistner mentioned this pull request Jun 11, 2026

feat(mongodb-storage)!: chunked multi-op bucket documents with range-merging compaction and invariant tests #617

Open

Sleepful reviewed Jun 12, 2026

View reviewed changes

Comment thread modules/module-mongodb-storage/src/storage/implementation/v3/MongoSyncBucketStorageV3.ts

Sleepful approved these changes Jun 12, 2026

View reviewed changes

rkistner merged commit 3493dc4 into compressed-bucket-storage Jun 12, 2026
43 checks passed

rkistner deleted the fix-empty-bucket-results branch June 12, 2026 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix edge case with emtpy bucket results#672

Fix edge case with emtpy bucket results#672
rkistner merged 1 commit into
compressed-bucket-storagefrom
fix-empty-bucket-results

rkistner commented Jun 11, 2026

Uh oh!

Sleepful left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rkistner commented Jun 11, 2026

Summary

The bug (written by Claude)

The fix

1. MongoSyncBucketStorageV3.getBucketDataBatchImpl

2. sync.ts — bucketDataBatch

Semantics note

AI Usage

Uh oh!

Sleepful left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `MongoSyncBucketStorageV3.getBucketDataBatchImpl`

2. `sync.ts` — `bucketDataBatch`