Skip to content

[XPU] Remove v0 cache schedule and use InputBatch#7603

Open
cmcamdy wants to merge 4 commits intoPaddlePaddle:developfrom
cmcamdy:rm_v0_cache_schedule
Open

[XPU] Remove v0 cache schedule and use InputBatch#7603
cmcamdy wants to merge 4 commits intoPaddlePaddle:developfrom
cmcamdy:rm_v0_cache_schedule

Conversation

@cmcamdy
Copy link
Copy Markdown
Collaborator

@cmcamdy cmcamdy commented Apr 23, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 23, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@cmcamdy cmcamdy changed the title [XPU] rm v0 cache schedule [XPU] rm v0 cache schedule and use InputBatch Apr 23, 2026
@cmcamdy cmcamdy changed the title [XPU] rm v0 cache schedule and use InputBatch [XPU] Remove v0 cache schedule and use InputBatch Apr 23, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-24 02:03:18

📋 Review 摘要

PR 概述:移除 XPU 平台的 V0 KVCache 调度器路径,统一使用 V1 KVCache 调度器,并同步将 pre_ids 缓冲区替换为 token_ids_all
变更范围xpu_model_runnerxpu_pre_and_post_processapply_penalty_multi_scoressamplerinput_batchspec_decode/mtp
影响面 TagXPU KVCache Scheduler Speculative Decoding


📝 PR 规范检查

PR 标题包含有效 [XPU] Tag,符合规范。但 MotivationModifications 章节均未填写,描述完全为空,建议补全以便后续追溯。

描述建议(可直接复制):

## Motivation
移除 XPU 平台对 V0 KVCache Scheduler 的支持,FastDeploy 2.6 及以上版本 XPU 统一使用 V1 KVCache Scheduler,简化代码路径,减少维护负担。

## Modifications
- `xpu_model_runner.py`:删除 `_init_share_inputs` 方法,改用 `InputBatch` 统一初始化;`insert_prefill_inputs` 替换为 `NotImplementedError` 防止 V0 误用;移除 `_prepare_inputs` 中的 `ENABLE_V1_KVCACHE_SCHEDULER` 条件
- `xpu_pre_and_post_process.py`:删除 `step_xpu` 函数(V0 调度入口);`update_inputs` 分支统一为 `update_inputs_v1`
- `apply_penalty_multi_scores.py` / `sampler.py`:XPU 路径将 `pre_token_ids` 替换为 `token_ids_all`
- `input_batch.py`:XPU 平台加入 CUDA 同一分支,统一通过 `token_ids_all` 派生 `pre_ids`

问题

级别 文件 概述
🟡 建议 xpu_model_runner.py:751 xpu_worker.py 未同步清理,存在死代码分支
🟡 建议 未添加单元测试或回归测试

总体评价

PR 的核心逻辑清晰正确:通过 InputBatch 统一初始化、删除 step_xpu 函数、将 pre_ids 路径收敛到 token_ids_all,整体方向合理。主要遗漏是 xpu_worker.py 未同步更新(preprocess_new_task 中仍有调用 insert_prefill_inputselse 分支,已成死代码),以及 PR 描述为空。建议补充描述并处理 xpu_worker.py 的遗留分支后合入。

Comment thread fastdeploy/worker/xpu_model_runner.py
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@029c0d6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/input_batch.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7603   +/-   ##
==========================================
  Coverage           ?   71.73%           
==========================================
  Files              ?      419           
  Lines              ?    57783           
  Branches           ?     9060           
==========================================
  Hits               ?    41448           
  Misses             ?    13512           
  Partials           ?     2823           
Flag Coverage Δ
GPU 71.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants