[meta] MetaLocalBackend::ListKeys support cursor based multi-turn scan#62
Open
[meta] MetaLocalBackend::ListKeys support cursor based multi-turn scan#62
Conversation
The original ListKeys implementation used an index-based cursor approach which ad several problems: 1. Non-deterministic iteration order: unordered_map doesn't guarantee consistent iteration order. 2. Unreliable cursor: If keys are added/deleted between scan calls, or if rehashing occurs, the index-based approach would miss or duplicate keys. 3. The test had a bug: The loop condition current_cursor != SCAN_BASE_CURSOR was false from the start, so the multi-turn scan was never actually tested. This patch replaced the index-based cursor with a key-based cursor approach: 1. Collect and sort keys: All keys are collected into a sorted vector for deterministic ordering. 2. Use key as cursor: The cursor is the string representation of the last returned key. 3. Binary search for position: Use std::upper_bound to efficiently find the starting position. 4. Return next cursor: The next cursor is the last returned key, or "0" (SCAN_BASE_CURSOR) when done. Also fixed the test to properly verify multi-turn cursor-based scanning: - Changed from a for loop with incorrect condition to a do-while loop. - Added proper collection of all keys across multiple turns. - Added iteration count limit to prevent infinite loops. - Verified that all keys are collected after multiple scan turns.
There was a problem hiding this comment.
👋 Review Summary
The change to a key-based cursor with deterministic ordering and the corresponding test fix address real correctness issues in the old ListKeys implementation and make the behavior more robust and predictable for basic multi-turn scans.
🛡️ Key Risks & Issues
- ListKeys now builds and sorts the full key set on every call, which is a clear correctness win but a potential scalability regression compared to the previous early-exit iterator. For large tables or frequent scans, this could noticeably increase CPU and lock-hold time; if MetaLocalBackend is used in hot paths, it might be worth considering an ordered data structure or cached index to keep determinism without full-table sort per page.
- Cursor semantics in MetaLocalBackend are now explicitly key-based (using the last key and
upper_bound), which differs from the opaque cursor behavior in MetaRedisBackend and Redis SCAN. This is fine if intentional, but it would be good to document the divergence in the MetaStorageBackend contract so callers understand that local vs Redis backends may not have identical cursor behavior in edge cases. - The updated TestListKeys loop is much safer than the original buggy one, but it no longer verifies that each page respects the per-call limit or that cursors progress sensibly; it only checks that all keys are eventually collected and that the loop terminates.
🧪 Verification Advice
- For larger key counts, consider manually exercising ListKeys with different limits (1, 10, 100, large) and checking that page sizes respect the limit and that the cursor sequence behaves as expected (no infinite loops, no regressions where everything comes back in a single page).
- If MetaLocalBackend is used in performance-sensitive scenarios, benchmarking ListKeys before and after this change with large keyspaces (e.g., tens or hundreds of thousands of keys) would help quantify the impact of the full-table sort per call.
💡 Thoughts & Suggestions
- If deterministic ordering is important but ListKeys is on a hot path, we might explore using an ordered container for keys or maintaining a separate sorted index, so each scan page does not require rebuilding and sorting the entire key set.
- Strengthening the stepwise ListKeys test slightly (by asserting per-call
keys.size() <= limitand, optionally, some simple invariants on cursor progression) would help ensure the new cursor-based behavior stays correct over time without adding much complexity to the tests.
🤖 Generated by Qoder • View workflow run
Perf optimization with lazy table keys sort.
There was a problem hiding this comment.
👋 Review Summary
The move to a key-based cursor with a sorted in-memory cache plus the strengthened tests addresses the original nondeterministic iteration and the buggy one-shot scan test. The new tests do a nice job of checking cursor monotonicity, per-page limits, and cache invalidation on writes.
🛡️ Key Risks & Issues
- Cursor validation is currently inconsistent between
limit > 0andlimit <= 0: for positive limits, non-base cursors are validated and"invalid_cursor"returnsEC_BADARGS, while forlimit <= 0the function returnsEC_OKand silently resets the cursor toSCAN_BASE_CURSORwithout ever parsing it. This is a behavior change from the previous implementation (which validated the cursor regardless of limit) and could hide client-side bugs where a bad cursor is passed along with a zero or negative limit. - The locking strategy around
ListKeysis now stricter: we takemutex_and hold it throughRebuildSortedCacheUnderLock(which walks and sorts the full key set) and the subsequent page extraction. This makes the behavior more deterministic, but at the cost of serializing all writers on the same mutex for the duration of large scans or cache rebuilds. It’s not a correctness bug, but in write-heavy environments with large keyspaces the longer critical section could become a noticeable bottleneck.
🧪 Verification Advice
- Extend
TestListKeyswith a case that exercisesListKeys("invalid_cursor", 0, ...)(and optionally negative limits) so the cursor handling behavior is intentional and protected by tests, either expectingEC_BADARGSuniformly or documenting/locking in the new permissive behavior. - If MetaLocalBackend is used in performance-sensitive code paths, consider a simple benchmark that compares
ListKeysbefore and after this change for large tables (e.g., tens of thousands of keys) under mixed read/write load to confirm that the extra time holdingmutex_is acceptable.
💡 Thoughts & Suggestions
- If we want consistent API semantics, it might be worth validating non-base cursors before the
limit <= 0early return, so that any invalid cursor always yieldsEC_BADARGSregardless of the limit. Alternatively, a short comment in the interface explaining thatlimit <= 0effectively “resets” scanning and ignores the input cursor would clarify the intended contract. - The new cache behavior test is a good addition; if we decide to evolve the cache further (e.g., to handle updates without invalidation or to relax ordering slightly), it may be useful to distinguish tests that assert the public contract (no missing/duplicated keys, eventual termination) from those that assert specific internal pagination patterns, so we have room to optimize the internals later without rewriting tests.
🤖 Generated by Qoder • View workflow run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The original ListKeys implementation used an index-based cursor approach which ad several problems:
This patch replaced the index-based cursor with a key-based cursor approach:
Also fixed the test to properly verify multi-turn cursor-based scanning: