Skip to content

[meta] handle instance storage usage data backward compatibility#103

Open
oldsharp wants to merge 4 commits intomainfrom
rc/feat/accurate-byte-usage-calc-handle-compatibility-for-main
Open

[meta] handle instance storage usage data backward compatibility#103
oldsharp wants to merge 4 commits intomainfrom
rc/feat/accurate-byte-usage-calc-handle-compatibility-for-main

Conversation

@oldsharp
Copy link
Copy Markdown
Collaborator

@oldsharp oldsharp commented Apr 7, 2026

1. Introducing instance version to handle storage usage data backward compatibility
Implements a backward compatibility mechanism for instance storage usage accounting by introducing the InstanceVersion enum to distinguish between legacy instances (VERSION_0) and new instances with accurate storage tracking (VERSION_1). This is a follow-up (or, fix-up) work for, and must be deployed together with, #77 .

  • version 0: legacy instances, storage usage calculated by block_size * key_count; metadata persisted:
    • METADATA_PROPERTY_KEY_COUNT
  • version 1: new instances with accurate storage usage tracking; metadata persisted:
    • METADATA_PROPERTY_KEY_COUNT
    • METADATA_PROPERTY_STORAGE_USAGE_DATA

2. CacheManagerMetricsRecorder add support for accurate usage data calculation

@oldsharp oldsharp requested a review from wangxiyu191 April 7, 2026 13:45
@oldsharp oldsharp self-assigned this Apr 7, 2026
Copy link
Copy Markdown

@qoderai qoderai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR implements a clean backward compatibility mechanism for instance storage usage data by introducing the InstanceVersion enum to distinguish between legacy instances (VERSION_0) and new instances with accurate storage tracking (VERSION_1). The approach is sound and the code is generally well-structured.

Key Risks & Issues

1. Inconsistent Unknown Version Handling
In data_storage_selector.cc, unknown versions log a warning but don't skip the instance, while the other two files (cache_manager_metrics_recorder.cc, cache_reclaimer.cc) use continue to skip unknown versions. This behavioral inconsistency could cause issues when future versions are introduced.

2. Per-Type Storage Usage for VERSION_0 Instances
In cache_reclaimer.cc, the calc_sz lambda unconditionally calls GetStorageUsageByType() for all storage types regardless of version. For VERSION_0 instances, this returns uninitialized/stale data since these instances never persist storage usage data. This may lead to incorrect per-type usage statistics for legacy instances.

3. VERSION_0 Instance State Consistency
VERSION_0 instances still receive AddStorageUsageByType/SubStorageUsageByType calls during put/delete operations (from meta_searcher.cc), modifying in-memory statistics that are never persisted. This creates a confusing state where VERSION_0 instances accumulate accurate in-memory statistics until restart, then revert to the legacy estimation method. Consider whether VERSION_0 instances should either not track storage usage at all, or have a migration path to VERSION_1.

Verification Advice

  • Test the version detection logic by simulating metadata recovery with and without METADATA_PROPERTY_STORAGE_USAGE_DATA
  • Verify that VERSION_0 instances correctly fall back to the byte_size_per_key * key_cnt estimation
  • Test the "unknown version" path to ensure graceful handling
  • Consider adding an integration test for rolling upgrade scenarios where instances of different versions coexist

Thoughts & Suggestions

  • Consider extracting the version-based size calculation into a helper function to reduce code duplication across the three files
  • The stricter error handling in RecoverMetaData() (returning EC_ERROR on deserialization failure instead of warning+reset) is a positive change that prevents silent data corruption
  • Good documentation in the header file explaining the version semantics

🤖 Generated by QoderView workflow run

@github-actions github-actions bot added the ai reviewed AI has reviewed this PR label Apr 7, 2026
@oldsharp
Copy link
Copy Markdown
Collaborator Author

oldsharp commented Apr 8, 2026

2. Per-Type Storage Usage for VERSION_0 Instances
In cache_reclaimer.cc, the calc_sz lambda unconditionally calls GetStorageUsageByType() for all storage types regardless of version. For VERSION_0 instances, this returns uninitialized/stale data since these instances never persist storage usage data. This may lead to incorrect per-type usage statistics for legacy instances.

This is explained at 5681bff#diff-72bf77fac911f2f22c1941b9af5aba07eb3a552d23dc044296827cde8d2a80dfR99 :

    // notes about backward compatibility:
    //
    // GetStorageUsage() is accurate for hybrid model arch, and should
    // only be used under InstanceVersion::VERSION_1
    //
    // similar logic for Get/Add/Sub-StorageUsageByType() are already
    // being used under InstanceVersion::VERSION_0; however they are
    // only accurate under InstanceVersion::VERSION_1

The GetStorageUsageByType() and other similar interfaces are kept as is for the consistency with version 0;
however they are only accurate when under version 1.

3. VERSION_0 Instance State Consistency
VERSION_0 instances still receive AddStorageUsageByType/SubStorageUsageByType calls during put/delete operations (from meta_searcher.cc), modifying in-memory statistics that are never persisted. This creates a confusing state where VERSION_0 instances accumulate accurate in-memory statistics until restart, then revert to the legacy estimation method. Consider whether VERSION_0 instances should either not track storage usage at all, or have a migration path to VERSION_1.

Same as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai reviewed AI has reviewed this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant