[KVCache] Support AttentionStore slice write blocks#7614
[KVCache] Support AttentionStore slice write blocks#7614jackyYang6 wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
def7979 to
7ba2272
Compare
7ba2272 to
4f955fe
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7614 +/- ##
==========================================
Coverage ? 71.66%
==========================================
Files ? 419
Lines ? 57882
Branches ? 9080
==========================================
Hits ? 41479
Misses ? 13577
Partials ? 2826
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-24 19:53:19
📋 Review 摘要
PR 概述:为 AttentionStore 的 KVCache 写入路径增加分片写入(Slice Write)支持,通过将大批量 block 写入拆分为多个切片来提升可控性。
变更范围:cache_manager/transfer_factory/mooncake_store/attention_store.py — write() 方法重写
影响面 Tag:KVCache
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | attention_store.py:167 |
AS_WRITE_SLICE_BLOCK_NUM=0 时 range() 抛出 ValueError 导致写操作崩溃 |
| 🟡 建议 | attention_store.py:170 |
[WRITE BEGIN] 日志两段 f-string 拼接缺少空格,日志可读性差 |
| 🟡 建议 | attention_store.py:233 |
[WRITE END] 日志级别从 debug 升级为 info,高并发场景下可能造成日志量暴增 |
| ❓ 疑问 | attention_store.py:166~168 |
os.getenv 在每次 write() 调用时读取,高频写入路径建议在 __init__ 中缓存 |
总体评价
分片写入逻辑设计合理,前缀缓存连续性保障(写入不完整时提前 break)思路正确。需修复 slice_block_num=0 的防御性校验(P0 崩溃风险),并建议将 [WRITE END] 日志级别调回 debug 以避免生产环境日志量过大。
| slice_block_num = int(os.getenv("AS_WRITE_SLICE_BLOCK_NUM", "500")) | ||
| slice_timeout = float(os.getenv("AS_WRITE_SLICE_TIMEOUT", "10")) | ||
| logger.debug( | ||
| f"[WRITE BEGIN] task_id: {task_id} token_ids: {token_ids} gpu_block_ids: {gpu_block_ids}" |
There was a problem hiding this comment.
🔴 Bug AS_WRITE_SLICE_BLOCK_NUM=0 时将导致 range() 抛出 ValueError 崩溃
range(0, total_blocks, 0) 会抛出 ValueError: range() arg 3 must not be zero,进而导致整个写操作崩溃,影响前缀缓存服务稳定性。同理,负数值也会导致静默跳过所有写入(total_blocks > 0 时什么都不写)。
建议在读取后立即做保护:
slice_block_num = int(os.getenv("AS_WRITE_SLICE_BLOCK_NUM", "500"))
if slice_block_num <= 0:
logger.warning(f"AS_WRITE_SLICE_BLOCK_NUM={slice_block_num} is invalid, reset to 500")
slice_block_num = 500| f"[WRITE BEGIN] task_id: {task_id} token_ids: {token_ids} gpu_block_ids: {gpu_block_ids}" | ||
| f"start_write_block_idx: {start_write_block_idx} timeout: {total_timeout}" | ||
| ) | ||
| total_blocks = len(gpu_block_ids) |
There was a problem hiding this comment.
🟡 建议 [WRITE BEGIN] 日志两段 f-string 拼接时缺少空格分隔符
两个相邻 f-string 直接拼接,第一段末尾 {gpu_block_ids} 与第二段开头 start_write_block_idx: 之间没有空格,日志输出将显示 ...{gpu_block_ids}start_write_block_idx: ...,可读性差且与其他字段格式不一致。
建议修复:
f"[WRITE BEGIN] task_id: {task_id} token_ids: {token_ids} gpu_block_ids: {gpu_block_ids} "
f"start_write_block_idx: {start_write_block_idx} timeout: {total_timeout}"| ) | ||
|
|
||
| total_cost = time.time() - overall_start | ||
| logger.info( |
There was a problem hiding this comment.
🟡 建议 [WRITE END] 日志级别由 debug 提升为 info,在高并发推理场景下可能产生大量日志
原代码 [WRITE END] 使用 logger.debug,本次改为 logger.info,每次写操作均会打印。大规模推理场景下 KVCache 写操作频繁,info 级别日志量将显著增大,影响磁盘 IO 和日志系统性能。
建议改回 logger.debug,或仅在总写入不完整时(total_written < total_blocks)使用 logger.warning 提示。
Motivation
This PR adds slice write support for AttentionStore to avoid writing a large number of KV cache blocks in a single SDK call.
It makes large cache write-back more controllable by splitting writes into multiple slices with configurable per-slice and total timeout limits.
Modifications
fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.pywrite path to split a single large write request into multiple slices.AS_WRITE_TOTAL_TIMEOUTAS_WRITE_SLICE_BLOCK_NUMAS_WRITE_SLICE_TIMEOUTslice_write_block_idxand truncatedtoken_ids, then callAttentionStoreSDK.write(...)incrementally.Usage or Command
Optional environment variables for AttentionStore slice write:
Accuracy Tests
N/A. This PR does not affect model forward outputs or kernel numerical behavior. It only changes the AttentionStore KV cache write-back strategy.
Checklist
[KVCache] Support AttentionStore slice write blocks[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag. (N/A for currentdevelopPR)