[KVCache] Support only flush FD GPU/CPU Cache index by AttentionStore by jackyYang6 · Pull Request #7609 · PaddlePaddle/FastDeploy

jackyYang6 · 2026-04-24T06:03:26Z

Motivation

This PR improves the FD_AS_ONLY_FLUSH flow for AttentionStore so FastDeploy can flush KV cache index state for both request-finish and CPU-eviction scenarios.

It adds the required flush metadata to support more accurate AttentionStore index updates.

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Extend WriteStorageTask with:
- flush_cache_exists to indicate whether cache still exists on the current node in FD_AS_ONLY_FLUSH mode.
- start_write_block_idx to support partial flush/write from a specified block index.
Update cache_transfer_manager AS-only flush path to call AttentionStore.flush_token_index(...) with both start_write_block_idx and reside_in_gpu.
Update prefix_cache_manager.free_cpu_block_ids(...) to emit flush-only tasks when CPU cache blocks are evicted in FD_AS_ONLY_FLUSH + attention_store mode.
Add FD_AS_ONLY_FLUSH environment variable entry in fastdeploy/envs.py.
Add unit test coverage for CPU eviction flush behavior, including:
- flush_cache_exists=False
- empty gpu_block_ids in flush-only mode
- correct start_write_block_idx=depth-1

Usage or Command

For FD_AS_ONLY_FLUSH mode with AttentionStore:

export FD_AS_ONLY_FLUSH=1

Reference test command:

python3 -m pytest tests/cache_manager/test_prefix_cache_manager.py -q

Accuracy Tests

N/A. This PR does not change model forward results or kernel numerical behavior. It only updates KV cache index flush metadata and adds unit tests for cache manager behavior.

Checklist

Add at least a tag in the PR title.
- Suggested title: [KVCache] Support flush FD GPU/CPU Cache index by AttentionStore
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag. (N/A for current develop PR)

paddle-bot · 2026-04-24T06:03:32Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-24 14:06:57

📋 Review 摘要

PR 概述：为 AttentionStore 的 FD_AS_ONLY_FLUSH 模式增加仅刷新 KV Cache 索引的能力，支持请求完成和 CPU 驱逐两个场景。
变更范围：cache_manager/（cache_tasks、cache_transfer_manager、prefix_cache_manager）、envs.py、单元测试
影响面 Tag：[KVCache]

问题

级别	文件	概述
❓ 疑问	`prefix_cache_manager.py:1556`	`node.input_ids` 语义不明，若为 partial token ids 则 flush 索引可能错误
🟡 建议	`cache_transfer_manager.py:975`	`flush_cache_exists=None` 时静默回退 `reside_in_gpu=True`，建议添加日志告警防止误用
🟡 建议	`prefix_cache_manager.py:1590`	建议补充注释说明 `is_sync=False` 的原因，防止维护者误改

总体评价

整体设计思路清晰，字段默认值设置合理，向后兼容，测试覆盖了主要的 CPU 驱逐场景。核心疑问是 node.input_ids 的语义需要作者确认，若为 partial token ids 则会导致 AttentionStore 索引更新错误，建议优先澄清后合入。

PaddlePaddle-bot · 2026-04-24T06:09:06Z

+        except Exception as e:
+            logger.warning(f"[AS_ONLY_FLUSH] Failed to flush token index for task {task.task_id}, error: {e}")
+        result = (CacheStatus.GPU2STORAGE, task.task_id, task.keys, [])
+        self.cache_task_queue.swap_to_storage_barrier.wait()


🟡 建议 _flush_only_storage_task 中 reside_in_gpu 语义存疑

当 task.flush_cache_exists is None 时，代码回退为 reside_in_gpu=True，注释说明此场景是「request finish」，语义上 cache 仍在 GPU，看起来合理。但 None 的含义是「不适用」，若调用方并非 request-finish 场景而忘记赋值，这里会静默传入 True，产生错误的索引状态。

建议在此处添加日志警告，明确 None 仅允许在请求完成（非 flush-only）路径下出现：

if task.flush_cache_exists is None: logger.warning("[AS_ONLY_FLUSH] flush_cache_exists is None, treating as cache present (reside_in_gpu=True)")

PaddlePaddle-bot · 2026-04-24T06:09:06Z


                    self.recycle_cpu_blocks(node.block_id)
                    hash_value_block_ids_map[node.input_hash_value].extend(reversed(tmp_block_ids))
+                    if envs.FD_AS_ONLY_FLUSH and self.kvcache_storage_backend == "attention_store":


❓ 疑问 node.input_ids 是当前节点的 token ids 还是整条路径的完整 token ids？

flush_token_index 需要完整的 token_ids 才能正确定位索引。若 input_ids 只是当前节点新增的 token slice，传入的 token_ids 将不完整，导致 AttentionStore 索引更新错误。

请确认 BlockNode.input_ids 的语义，如果是 partial token ids，需要从 root 到当前节点拼接完整路径才能使用。

PaddlePaddle-bot · 2026-04-24T06:09:06Z

+                    task_id=str(uuid.uuid4()),
+                    keys=[input_hash_value],
+                    token_ids=token_ids,
+                    gpu_block_ids=[],


🟡 建议 建议补充注释说明 is_sync=False 的原因，避免维护者误改

此处使用 is_sync=False 是因为 flush 操作不需要等待结果，这与请求完成时的 flush（通常同步）行为不同。建议在调用处增加注释说明异步原因，避免后续维护者误改为同步导致死锁或性能问题。

另外，is_sync=False 的任务不在 task_write_back_event 中注册 Event，请确认下游（如 transfer_done 信号消费方）能正确处理仅 rank==0 发出信号的场景。

codecov-commenter · 2026-04-24T07:32:44Z

Codecov Report

❌ Patch coverage is 48.38710% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@4c8f7df). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/cache_transfer_manager.py	6.66%	13 Missing and 1 partial ⚠️
fastdeploy/cache_manager/prefix_cache_manager.py	84.61%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7609   +/-   ##
==========================================
  Coverage           ?   71.71%           
==========================================
  Files              ?      419           
  Lines              ?    57819           
  Branches           ?     9073           
==========================================
  Hits               ?    41465           
  Misses             ?    13527           
  Partials           ?     2827

Flag	Coverage Δ
GPU	`71.71% <48.38%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[KVCache] Support flush FD GPU/CPU Cache index by AttentionStore

22dab4e

jackyYang6 temporarily deployed to Metax_ci April 24, 2026 06:03 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Support only flush FD GPU/CPU Cache index by AttentionStore#7609

[KVCache] Support only flush FD GPU/CPU Cache index by AttentionStore#7609
jackyYang6 wants to merge 1 commit intoPaddlePaddle:developfrom
jackyYang6:kvcache/as_only_flush

jackyYang6 commented Apr 24, 2026

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

codecov-commenter commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jackyYang6 commented Apr 24, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants