[meta] MetaLocalBackend::ListKeys support cursor based multi-turn scan by oldsharp · Pull Request #62 · alibaba/tair-kvcache

oldsharp · 2026-03-18T03:46:14Z

The original ListKeys implementation used an index-based cursor approach which ad several problems:

Non-deterministic iteration order: unordered_map doesn't guarantee consistent iteration order.
Unreliable cursor: If keys are added/deleted between scan calls, or if rehashing occurs, the index-based approach would miss or duplicate keys.
The test had a bug: The loop condition current_cursor != SCAN_BASE_CURSOR was false from the start, so the multi-turn scan was never actually tested.

This patch replaced the index-based cursor with a key-based cursor approach:

Collect and sort keys: All keys are collected into a sorted vector for deterministic ordering.
Use key as cursor: The cursor is the string representation of the last returned key.
Binary search for position: Use std::upper_bound to efficiently find the starting position.
Return next cursor: The next cursor is the last returned key, or "0" (SCAN_BASE_CURSOR) when done.

Also fixed the test to properly verify multi-turn cursor-based scanning:

Changed from a for loop with incorrect condition to a do-while loop.
Added proper collection of all keys across multiple turns.
Added iteration count limit to prevent infinite loops.
Verified that all keys are collected after multiple scan turns.

The original ListKeys implementation used an index-based cursor approach which ad several problems: 1. Non-deterministic iteration order: unordered_map doesn't guarantee consistent iteration order. 2. Unreliable cursor: If keys are added/deleted between scan calls, or if rehashing occurs, the index-based approach would miss or duplicate keys. 3. The test had a bug: The loop condition current_cursor != SCAN_BASE_CURSOR was false from the start, so the multi-turn scan was never actually tested. This patch replaced the index-based cursor with a key-based cursor approach: 1. Collect and sort keys: All keys are collected into a sorted vector for deterministic ordering. 2. Use key as cursor: The cursor is the string representation of the last returned key. 3. Binary search for position: Use std::upper_bound to efficiently find the starting position. 4. Return next cursor: The next cursor is the last returned key, or "0" (SCAN_BASE_CURSOR) when done. Also fixed the test to properly verify multi-turn cursor-based scanning: - Changed from a for loop with incorrect condition to a do-while loop. - Added proper collection of all keys across multiple turns. - Added iteration count limit to prevent infinite loops. - Verified that all keys are collected after multiple scan turns.

qoderai

👋 Review Summary

The change to a key-based cursor with deterministic ordering and the corresponding test fix address real correctness issues in the old ListKeys implementation and make the behavior more robust and predictable for basic multi-turn scans.

🛡️ Key Risks & Issues

ListKeys now builds and sorts the full key set on every call, which is a clear correctness win but a potential scalability regression compared to the previous early-exit iterator. For large tables or frequent scans, this could noticeably increase CPU and lock-hold time; if MetaLocalBackend is used in hot paths, it might be worth considering an ordered data structure or cached index to keep determinism without full-table sort per page.
Cursor semantics in MetaLocalBackend are now explicitly key-based (using the last key and upper_bound), which differs from the opaque cursor behavior in MetaRedisBackend and Redis SCAN. This is fine if intentional, but it would be good to document the divergence in the MetaStorageBackend contract so callers understand that local vs Redis backends may not have identical cursor behavior in edge cases.
The updated TestListKeys loop is much safer than the original buggy one, but it no longer verifies that each page respects the per-call limit or that cursors progress sensibly; it only checks that all keys are eventually collected and that the loop terminates.

🧪 Verification Advice

For larger key counts, consider manually exercising ListKeys with different limits (1, 10, 100, large) and checking that page sizes respect the limit and that the cursor sequence behaves as expected (no infinite loops, no regressions where everything comes back in a single page).
If MetaLocalBackend is used in performance-sensitive scenarios, benchmarking ListKeys before and after this change with large keyspaces (e.g., tens or hundreds of thousands of keys) would help quantify the impact of the full-table sort per call.

💡 Thoughts & Suggestions

If deterministic ordering is important but ListKeys is on a hot path, we might explore using an ordered container for keys or maintaining a separate sorted index, so each scan page does not require rebuilding and sorting the entire key set.
Strengthening the stepwise ListKeys test slightly (by asserting per-call keys.size() <= limit and, optionally, some simple invariants on cursor progression) would help ensure the new cursor-based behavior stays correct over time without adding much complexity to the tests.

🤖 Generated by Qoder • View workflow run

kv_cache_manager/meta/meta_local_backend.cc

kv_cache_manager/meta/test/meta_local_backend_test.cc

Perf optimization with lazy table keys sort.

qoderai

👋 Review Summary

The move to a key-based cursor with a sorted in-memory cache plus the strengthened tests addresses the original nondeterministic iteration and the buggy one-shot scan test. The new tests do a nice job of checking cursor monotonicity, per-page limits, and cache invalidation on writes.

🛡️ Key Risks & Issues

Cursor validation is currently inconsistent between limit > 0 and limit <= 0: for positive limits, non-base cursors are validated and "invalid_cursor" returns EC_BADARGS, while for limit <= 0 the function returns EC_OK and silently resets the cursor to SCAN_BASE_CURSOR without ever parsing it. This is a behavior change from the previous implementation (which validated the cursor regardless of limit) and could hide client-side bugs where a bad cursor is passed along with a zero or negative limit.
The locking strategy around ListKeys is now stricter: we take mutex_ and hold it through RebuildSortedCacheUnderLock (which walks and sorts the full key set) and the subsequent page extraction. This makes the behavior more deterministic, but at the cost of serializing all writers on the same mutex for the duration of large scans or cache rebuilds. It’s not a correctness bug, but in write-heavy environments with large keyspaces the longer critical section could become a noticeable bottleneck.

🧪 Verification Advice

Extend TestListKeys with a case that exercises ListKeys("invalid_cursor", 0, ...) (and optionally negative limits) so the cursor handling behavior is intentional and protected by tests, either expecting EC_BADARGS uniformly or documenting/locking in the new permissive behavior.
If MetaLocalBackend is used in performance-sensitive code paths, consider a simple benchmark that compares ListKeys before and after this change for large tables (e.g., tens of thousands of keys) under mixed read/write load to confirm that the extra time holding mutex_ is acceptable.

💡 Thoughts & Suggestions

If we want consistent API semantics, it might be worth validating non-base cursors before the limit <= 0 early return, so that any invalid cursor always yields EC_BADARGS regardless of the limit. Alternatively, a short comment in the interface explaining that limit <= 0 effectively “resets” scanning and ignores the input cursor would clarify the intended contract.
The new cache behavior test is a good addition; if we decide to evolve the cache further (e.g., to handle updates without invalidation or to relax ordering slightly), it may be useful to distinguish tests that assert the public contract (no missing/duplicated keys, eventual termination) from those that assert specific internal pagination patterns, so we have room to optimize the internals later without rewriting tests.

🤖 Generated by Qoder • View workflow run

kv_cache_manager/meta/meta_local_backend.cc

oldsharp self-assigned this Mar 18, 2026

qoderai bot reviewed Mar 18, 2026

View reviewed changes

kv_cache_manager/meta/meta_local_backend.cc Outdated Show resolved Hide resolved

kv_cache_manager/meta/test/meta_local_backend_test.cc Outdated Show resolved Hide resolved

github-actions bot added the ai reviewed AI has reviewed this PR label Mar 18, 2026

oldsharp added 2 commits March 18, 2026 15:42

[meta] MetaLocalBackend::ListKeys perf optimization

0c44e7b

Perf optimization with lazy table keys sort.

[test] refine test case

4aafe93

oldsharp removed the ai reviewed AI has reviewed this PR label Mar 18, 2026

qoderai bot reviewed Mar 18, 2026

View reviewed changes

kv_cache_manager/meta/meta_local_backend.cc Show resolved Hide resolved

github-actions bot added the ai reviewed AI has reviewed this PR label Mar 18, 2026

[meta] tighten argument inspection of MetaLocalBackend::ListKeys

6228a6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[meta] MetaLocalBackend::ListKeys support cursor based multi-turn scan#62

[meta] MetaLocalBackend::ListKeys support cursor based multi-turn scan#62
oldsharp wants to merge 4 commits intomainfrom
rc/test/fix-flaky-test-trim_cache

oldsharp commented Mar 18, 2026

Uh oh!

qoderai bot left a comment

Uh oh!

Uh oh!

Uh oh!

qoderai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oldsharp commented Mar 18, 2026

Uh oh!

qoderai bot left a comment

Choose a reason for hiding this comment

👋 Review Summary

🛡️ Key Risks & Issues

🧪 Verification Advice

💡 Thoughts & Suggestions

Uh oh!

Uh oh!

Uh oh!

qoderai bot left a comment

Choose a reason for hiding this comment

👋 Review Summary

🛡️ Key Risks & Issues

🧪 Verification Advice

💡 Thoughts & Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant