[Bugfix] align input_token_ids with gpu_block_ids by zccjjj · Pull Request #7574 · PaddlePaddle/FastDeploy

zccjjj · 2026-04-23T05:37:41Z

Motivation

用于0423本地镜像构造fd
附带以下改动：
ccbd245
https://github.com/PaddlePaddle/FastDeploy/pull/7455/changes#diff-87df1cd74b84894a10d9008fa9d93f7f0a3161355b7edcef771e2b32a9c57909

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-23T05:37:48Z

Thanks for your contribution!

codecov-commenter · 2026-04-23T07:06:18Z

Codecov Report

❌ Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5e87930). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...transfer_factory/mooncake_store/attention_store.py	28.57%	5 Missing ⚠️
fastdeploy/cache_manager/prefix_cache_manager.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7574   +/-   ##
==========================================
  Coverage           ?   72.23%           
==========================================
  Files              ?      419           
  Lines              ?    57853           
  Branches           ?     9076           
==========================================
  Hits               ?    41788           
  Misses             ?    13212           
  Partials           ?     2853

Flag	Coverage Δ
GPU	`72.23% <33.33%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-24 21:29:26

📋 Review 摘要

PR 概述：修复 write_cache_to_storage 中 input_token_ids 与 gpu_block_ids 长度不对齐的问题，并为 AttentionStore 增加 splitwise_role 支持及环境变量覆盖配置。
变更范围：cache_manager/prefix_cache_manager.py、cache_manager/transfer_factory/mooncake_store/attention_store.py、cache_manager/cache_transfer_manager.py
影响面 Tag：[KVCache]

📝 PR 规范检查

标题 Tag 大小写有误：当前使用 [Bugfix]，官方 Tag 列表中为 [BugFix]（大写 F），建议修正：
- 标题建议（可直接复制）：[BugFix] align input_token_ids with gpu_block_ids
描述未填写：Motivation 与 Modifications 两个必填章节均为空（仅保留了模板注释），无法了解此 Bug 的触发条件和具体修改内容。建议参考如下内容填写：

## Motivation
在调用 `write_cache_to_storage` 时，`input_token_ids` 的长度未与截断后的 `gpu_block_ids`（`request.block_tables[:len(keys)]`）对齐，导致写入存储的 token id 数组与实际 block 数量不匹配。

## Modifications
1. `prefix_cache_manager.py`：移除 `enable_output_caching` 条件分支，统一拼接 output_token_ids，并在写存储前将 `input_token_ids` 截断至 `len(keys) * block_size`，确保与 `gpu_block_ids` 对齐。
2. `attention_store.py`：新增 `splitwise_role` 配置字段，支持通过环境变量 `AS_NAMESPACE`/`AS_POD_NAME`/`AS_MODEL_VERSION`/`ENABLE_EP_DP_IN_FD` 覆盖初始化配置，以适配 splitwise 场景下的 pod 命名。
3. `cache_transfer_manager.py`：`_init_storage` 透传 `splitwise_role` 参数。

问题

级别	文件	概述
❓ 疑问	`prefix_cache_manager.py:1123`	移除 `enable_output_caching` 条件后，`output_token_ids` 会始终被追加，需确认在 `enable_output_caching=False` 场景下不会污染 prefix cache
🟡 建议	`attention_store.py:69`	`int(os.getenv("ENABLE_EP_DP_IN_FD", "1"))` 缺少异常处理，非整数环境变量值会导致 `ValueError` 崩溃

总体评价

核心的 token_ids 对齐修复（截断至 len(keys) * block_size）逻辑清晰，方向正确；但移除 enable_output_caching 条件分支的意图需进一步确认，建议作者补充说明，避免引入潜在的 prefix cache 污染问题。

PaddlePaddle-bot · 2026-04-24T13:31:28Z

-            input_token_ids = token_ids + request.output_token_ids
-        else:
-            input_token_ids = token_ids
+        input_token_ids = token_ids + request.output_token_ids


❓ 疑问 移除 enable_output_caching 条件是否符合预期？

原代码在 enable_output_caching=False 时，input_token_ids 只包含 input tokens；现在无论该配置如何，总会追加 request.output_token_ids。

虽然后续的截断 input_token_ids[: len(keys) * block_size] 可能会将 output tokens 剪掉，但这取决于 keys 覆盖的范围。如果 enable_output_caching=False 时 keys 的数量意外覆盖了 output token 区间，缓存内容将包含本不应写入的 output tokens，可能污染下次 prefix cache 命中结果。

请确认：当 enable_output_caching=False 时，keys 的长度是否严格不超过 input 对应的 block 数？如果可以保证，建议在此处添加注释说明该假设。

PaddlePaddle-bot · 2026-04-24T13:31:28Z

        try:
+            self.config.namespace = os.getenv("AS_NAMESPACE", self.config.namespace)
+            self.config.pod_name = os.getenv("AS_POD_NAME", self.config.pod_name)
+            if int(os.getenv("ENABLE_EP_DP_IN_FD", "1")):


🟡 建议 int(os.getenv(...)) 缺少异常处理，健壮性不足。

当环境变量 ENABLE_EP_DP_IN_FD 被设置为非整数字符串（如 "true" / "yes" / "on"）时，int() 会直接抛出 ValueError，导致初始化过程崩溃。

建议修改为更健壮的写法：

if os.getenv("ENABLE_EP_DP_IN_FD", "1") != "0":

bugfix

35f3603

zccjjj temporarily deployed to Metax_ci April 23, 2026 05:37 — with GitHub Actions Inactive

zccjjj changed the title ~~bugfix~~ [Bugfix] align input_token_ids with gpu_block_ids Apr 23, 2026

This comment was marked as outdated.

Sign in to view

AS support

dc3d933

zccjjj temporarily deployed to Metax_ci April 24, 2026 13:23 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] align input_token_ids with gpu_block_ids#7574

[Bugfix] align input_token_ids with gpu_block_ids#7574
zccjjj wants to merge 2 commits intoPaddlePaddle:developfrom
zccjjj:bugfix

zccjjj commented Apr 23, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Apr 23, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 23, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zccjjj commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 23, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zccjjj commented Apr 23, 2026 •

edited

Loading

codecov-commenter commented Apr 23, 2026 •

edited

Loading