[KVCache] Support AttentionStore slice write blocks by jackyYang6 · Pull Request #7614 · PaddlePaddle/FastDeploy

jackyYang6 · 2026-04-24T09:59:13Z

Motivation

This PR adds slice write support for AttentionStore to avoid writing a large number of KV cache blocks in a single SDK call.

It makes large cache write-back more controllable by splitting writes into multiple slices with configurable per-slice and total timeout limits.

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py write path to split a single large write request into multiple slices.
Add configurable write controls:
- AS_WRITE_TOTAL_TIMEOUT
- AS_WRITE_SLICE_BLOCK_NUM
- AS_WRITE_SLICE_TIMEOUT
For each slice, compute the corresponding slice_write_block_idx and truncated token_ids, then call AttentionStoreSDK.write(...) incrementally.
Add logging for slice begin/end, incomplete writes, and total write summary.
Stop subsequent slices when a slice write is incomplete, so prefix cache continuity is preserved.
No unit tests are added in this PR because this change is in the AttentionStore SDK integration path and requires an AttentionStore runtime environment for end-to-end validation.

Usage or Command

Optional environment variables for AttentionStore slice write:

export AS_WRITE_TOTAL_TIMEOUT=30
export AS_WRITE_SLICE_BLOCK_NUM=500
export AS_WRITE_SLICE_TIMEOUT=10

Accuracy Tests

N/A. This PR does not affect model forward outputs or kernel numerical behavior. It only changes the AttentionStore KV cache write-back strategy.

Checklist

Add at least a tag in the PR title.
- Suggested title: [KVCache] Support AttentionStore slice write blocks
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag. (N/A for current develop PR)

paddle-bot · 2026-04-24T09:59:20Z

Thanks for your contribution!

codecov-commenter · 2026-04-24T11:46:20Z

Codecov Report

❌ Patch coverage is 0% with 37 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@ee81b57). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...transfer_factory/mooncake_store/attention_store.py	0.00%	37 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7614   +/-   ##
==========================================
  Coverage           ?   71.66%           
==========================================
  Files              ?      419           
  Lines              ?    57882           
  Branches           ?     9080           
==========================================
  Hits               ?    41479           
  Misses             ?    13577           
  Partials           ?     2826

Flag	Coverage Δ
GPU	`71.66% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-24 19:53:19

📋 Review 摘要

PR 概述：为 AttentionStore 的 KVCache 写入路径增加分片写入（Slice Write）支持，通过将大批量 block 写入拆分为多个切片来提升可控性。
变更范围：cache_manager/transfer_factory/mooncake_store/attention_store.py — write() 方法重写
影响面 Tag：KVCache

问题

级别	文件	概述
🔴 Bug	`attention_store.py:167`	`AS_WRITE_SLICE_BLOCK_NUM=0` 时 `range()` 抛出 `ValueError` 导致写操作崩溃
🟡 建议	`attention_store.py:170`	`[WRITE BEGIN]` 日志两段 f-string 拼接缺少空格，日志可读性差
🟡 建议	`attention_store.py:233`	`[WRITE END]` 日志级别从 `debug` 升级为 `info`，高并发场景下可能造成日志量暴增
❓ 疑问	`attention_store.py:166~168`	`os.getenv` 在每次 `write()` 调用时读取，高频写入路径建议在 `__init__` 中缓存

总体评价

分片写入逻辑设计合理，前缀缓存连续性保障（写入不完整时提前 break）思路正确。需修复 slice_block_num=0 的防御性校验（P0 崩溃风险），并建议将 [WRITE END] 日志级别调回 debug 以避免生产环境日志量过大。

PaddlePaddle-bot · 2026-04-24T11:56:29Z

+        slice_block_num = int(os.getenv("AS_WRITE_SLICE_BLOCK_NUM", "500"))
+        slice_timeout = float(os.getenv("AS_WRITE_SLICE_TIMEOUT", "10"))
+        logger.debug(
+            f"[WRITE BEGIN] task_id: {task_id} token_ids: {token_ids} gpu_block_ids: {gpu_block_ids}"


🔴 Bug AS_WRITE_SLICE_BLOCK_NUM=0 时将导致 range() 抛出 ValueError 崩溃

range(0, total_blocks, 0) 会抛出 ValueError: range() arg 3 must not be zero，进而导致整个写操作崩溃，影响前缀缓存服务稳定性。同理，负数值也会导致静默跳过所有写入（total_blocks > 0 时什么都不写）。

建议在读取后立即做保护：

slice_block_num = int(os.getenv("AS_WRITE_SLICE_BLOCK_NUM", "500")) if slice_block_num <= 0: logger.warning(f"AS_WRITE_SLICE_BLOCK_NUM={slice_block_num} is invalid, reset to 500") slice_block_num = 500

PaddlePaddle-bot · 2026-04-24T11:56:29Z

+            f"[WRITE BEGIN] task_id: {task_id} token_ids: {token_ids} gpu_block_ids: {gpu_block_ids}"
+            f"start_write_block_idx: {start_write_block_idx} timeout: {total_timeout}"
+        )
+        total_blocks = len(gpu_block_ids)


🟡 建议 [WRITE BEGIN] 日志两段 f-string 拼接时缺少空格分隔符

两个相邻 f-string 直接拼接，第一段末尾 {gpu_block_ids} 与第二段开头 start_write_block_idx: 之间没有空格，日志输出将显示 ...{gpu_block_ids}start_write_block_idx: ...，可读性差且与其他字段格式不一致。

建议修复：

f"[WRITE BEGIN] task_id: {task_id} token_ids: {token_ids} gpu_block_ids: {gpu_block_ids} " f"start_write_block_idx: {start_write_block_idx} timeout: {total_timeout}"

PaddlePaddle-bot · 2026-04-24T11:56:29Z

+            )
+
+        total_cost = time.time() - overall_start
+        logger.info(


🟡 建议 [WRITE END] 日志级别由 debug 提升为 info，在高并发推理场景下可能产生大量日志

原代码 [WRITE END] 使用 logger.debug，本次改为 logger.info，每次写操作均会打印。大规模推理场景下 KVCache 写操作频繁，info 级别日志量将显著增大，影响磁盘 IO 和日志系统性能。

建议改回 logger.debug，或仅在总写入不完整时（total_written < total_blocks）使用 logger.warning 提示。

jackyYang6 had a problem deploying to Metax_ci April 24, 2026 09:59 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

jackyYang6 changed the title ~~[KVCache] Support AttentionStore sclice write blocks~~ [KVCache] Support AttentionStore slice write blocks Apr 24, 2026

jackyYang6 force-pushed the as/as_slice_write branch from def7979 to 7ba2272 Compare April 24, 2026 10:11

jackyYang6 had a problem deploying to Metax_ci April 24, 2026 10:11 — with GitHub Actions Error

[KVCache] Support AttentionStore sclice write blocks

4f955fe

jackyYang6 force-pushed the as/as_slice_write branch from 7ba2272 to 4f955fe Compare April 24, 2026 10:13

jackyYang6 had a problem deploying to Metax_ci April 24, 2026 10:13 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Support AttentionStore slice write blocks#7614

[KVCache] Support AttentionStore slice write blocks#7614
jackyYang6 wants to merge 1 commit intoPaddlePaddle:developfrom
jackyYang6:as/as_slice_write

jackyYang6 commented Apr 24, 2026

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 24, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jackyYang6 commented Apr 24, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 24, 2026

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants