[KVCache] Support only flush FD GPU/CPU Cache index by AttentionStore#7609
[KVCache] Support only flush FD GPU/CPU Cache index by AttentionStore#7609jackyYang6 wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-24 14:06:57
📋 Review 摘要
PR 概述:为 AttentionStore 的 FD_AS_ONLY_FLUSH 模式增加仅刷新 KV Cache 索引的能力,支持请求完成和 CPU 驱逐两个场景。
变更范围:cache_manager/(cache_tasks、cache_transfer_manager、prefix_cache_manager)、envs.py、单元测试
影响面 Tag:[KVCache]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | prefix_cache_manager.py:1556 |
node.input_ids 语义不明,若为 partial token ids 则 flush 索引可能错误 |
| 🟡 建议 | cache_transfer_manager.py:975 |
flush_cache_exists=None 时静默回退 reside_in_gpu=True,建议添加日志告警防止误用 |
| 🟡 建议 | prefix_cache_manager.py:1590 |
建议补充注释说明 is_sync=False 的原因,防止维护者误改 |
总体评价
整体设计思路清晰,字段默认值设置合理,向后兼容,测试覆盖了主要的 CPU 驱逐场景。核心疑问是 node.input_ids 的语义需要作者确认,若为 partial token ids 则会导致 AttentionStore 索引更新错误,建议优先澄清后合入。
| except Exception as e: | ||
| logger.warning(f"[AS_ONLY_FLUSH] Failed to flush token index for task {task.task_id}, error: {e}") | ||
| result = (CacheStatus.GPU2STORAGE, task.task_id, task.keys, []) | ||
| self.cache_task_queue.swap_to_storage_barrier.wait() |
There was a problem hiding this comment.
🟡 建议 _flush_only_storage_task 中 reside_in_gpu 语义存疑
当 task.flush_cache_exists is None 时,代码回退为 reside_in_gpu=True,注释说明此场景是「request finish」,语义上 cache 仍在 GPU,看起来合理。但 None 的含义是「不适用」,若调用方并非 request-finish 场景而忘记赋值,这里会静默传入 True,产生错误的索引状态。
建议在此处添加日志警告,明确 None 仅允许在请求完成(非 flush-only)路径下出现:
if task.flush_cache_exists is None:
logger.warning("[AS_ONLY_FLUSH] flush_cache_exists is None, treating as cache present (reside_in_gpu=True)")|
|
||
| self.recycle_cpu_blocks(node.block_id) | ||
| hash_value_block_ids_map[node.input_hash_value].extend(reversed(tmp_block_ids)) | ||
| if envs.FD_AS_ONLY_FLUSH and self.kvcache_storage_backend == "attention_store": |
There was a problem hiding this comment.
❓ 疑问 node.input_ids 是当前节点的 token ids 还是整条路径的完整 token ids?
flush_token_index 需要完整的 token_ids 才能正确定位索引。若 input_ids 只是当前节点新增的 token slice,传入的 token_ids 将不完整,导致 AttentionStore 索引更新错误。
请确认 BlockNode.input_ids 的语义,如果是 partial token ids,需要从 root 到当前节点拼接完整路径才能使用。
| task_id=str(uuid.uuid4()), | ||
| keys=[input_hash_value], | ||
| token_ids=token_ids, | ||
| gpu_block_ids=[], |
There was a problem hiding this comment.
🟡 建议 建议补充注释说明 is_sync=False 的原因,避免维护者误改
此处使用 is_sync=False 是因为 flush 操作不需要等待结果,这与请求完成时的 flush(通常同步)行为不同。建议在调用处增加注释说明异步原因,避免后续维护者误改为同步导致死锁或性能问题。
另外,is_sync=False 的任务不在 task_write_back_event 中注册 Event,请确认下游(如 transfer_done 信号消费方)能正确处理仅 rank==0 发出信号的场景。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7609 +/- ##
==========================================
Coverage ? 71.71%
==========================================
Files ? 419
Lines ? 57819
Branches ? 9073
==========================================
Hits ? 41465
Misses ? 13527
Partials ? 2827
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
This PR improves the
FD_AS_ONLY_FLUSHflow for AttentionStore so FastDeploy can flush KV cache index state for both request-finish and CPU-eviction scenarios.It adds the required flush metadata to support more accurate AttentionStore index updates.
Modifications
WriteStorageTaskwith:flush_cache_existsto indicate whether cache still exists on the current node inFD_AS_ONLY_FLUSHmode.start_write_block_idxto support partial flush/write from a specified block index.cache_transfer_managerAS-only flush path to callAttentionStore.flush_token_index(...)with bothstart_write_block_idxandreside_in_gpu.prefix_cache_manager.free_cpu_block_ids(...)to emit flush-only tasks when CPU cache blocks are evicted inFD_AS_ONLY_FLUSH+attention_storemode.FD_AS_ONLY_FLUSHenvironment variable entry infastdeploy/envs.py.flush_cache_exists=Falsegpu_block_idsin flush-only modestart_write_block_idx=depth-1Usage or Command
For
FD_AS_ONLY_FLUSHmode with AttentionStore:export FD_AS_ONLY_FLUSH=1Reference test command:
Accuracy Tests
N/A. This PR does not change model forward results or kernel numerical behavior. It only updates KV cache index flush metadata and adds unit tests for cache manager behavior.
Checklist
[KVCache] Support flush FD GPU/CPU Cache index by AttentionStore[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag. (N/A for currentdevelopPR)