bugfix: fix DeepSeek V4 schedule_overlap + mtp. by JC-ut0 · Pull Request #1782 · jd-opensource/xllm

JC-ut0 · 2026-06-18T08:37:48Z

Description

This PR fixes a DeepSeek V4 precision regression when enable_schedule_overlap=true and MTP are enabled. The root cause was that Sequence::update_last_step_token() used num_kv_blocks()==0 to decide whether to skip last-step token replacement, which incorrectly treated composite KV layouts as KV-less sequences. The fix switches the guard to current_max_tokens_capacity()==0, so preempted sequences are still protected while composite KV sequences keep their accepted MTP tokens. The patch also adds a regression test covering the composite KV case and preserves full replacement of accepted MTP tokens in overlap mode.

Related Issues

Change Type

Pull Request Checklist

Thank you for contributing to xLLM. Before requesting review, please make sure the following items are complete.

PR Title and Commit Messages

The PR title and each commit message follow the xLLM commit format: <type>: <subject>.

Allowed types: feat, bugfix, docs, test, refactor, chore, style, revert, perf, model, build, release.
The subject should use clear English, start with a verb, include at least 4 words, and end with ..

Pre-commit Checks

I have installed pre-commit by running pip install pre-commit or an equivalent command.
I have installed the hooks with pre-commit install.
I have run pre-commit run --all-files and fixed any reported issues.

If you are unsure how to set up pre-commit, see the pre-commit documentation.

Self Review

I have self-reviewed the code according to .agents/skills/code-review/references/custom-code-style.md, especially code written or assisted by AI.
I have rebased this PR onto the latest main branch.

Build and Test Coverage

Tests have been added or updated as needed.
CUDA: python setup.py build test has passed on a CUDA machine.
NPU: python setup.py build test has passed on an NPU machine.
MLU: python setup.py build test has passed on an MLU machine.

Reviewer Notes

…p + mtp.

gemini-code-assist

Code Review

This pull request updates Sequence::update_last_step_token to check current_max_tokens_capacity() instead of num_kv_blocks() when determining if a sequence's KV cache has been deallocated, accommodating composite KV managers that can have capacity without exposing local blocks. It also adds a corresponding unit test OverlapMTPReplacementKeepsCompositeKvBlocks to verify this behavior. Feedback was provided to annotate a constant argument with its parameter name in the new test code, in accordance with the repository style guide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

bugfix: fix the DeepSeek V4 precision regression with schedule_overla…

a480dbc

…p + mtp.

JC-ut0 requested review from Clement-Wang26, DongheJin, DragonFive, JimHsiung, Kang-Meng, RobbieLeung, XuZhang99, liujinguang0125, liutongxuan, walsonyang, xiao-yu-chen, yingxudeng, yq33victor and zhang-minchao as code owners June 18, 2026 08:37

gemini-code-assist Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread tests/core/framework/batch/batch_test.cpp Outdated

Apply suggestion from @gemini-code-assist[bot]

f139118

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

DongheJin previously approved these changes Jun 22, 2026

View reviewed changes

yingxudeng previously approved these changes Jun 22, 2026

View reviewed changes

Merge branch 'main' into fix_mtp_618_main

9a8e5eb

JC-ut0 dismissed stale reviews from yingxudeng and DongheJin via 9a8e5eb June 24, 2026 07:50

Kang-Meng approved these changes Jun 24, 2026

View reviewed changes

yingxudeng approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bugfix: fix DeepSeek V4 schedule_overlap + mtp.#1782

bugfix: fix DeepSeek V4 schedule_overlap + mtp.#1782
JC-ut0 wants to merge 3 commits into
jd-opensource:mainfrom
JC-ut0:fix_mtp_618_main

JC-ut0 commented Jun 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

JC-ut0 commented Jun 18, 2026

Description

Related Issues

Change Type

Pull Request Checklist

PR Title and Commit Messages

Pre-commit Checks

Self Review

Build and Test Coverage

Reviewer Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants