Skip to content

bugfix: fix DeepSeek V4 schedule_overlap + mtp.#1782

Open
JC-ut0 wants to merge 3 commits into
jd-opensource:mainfrom
JC-ut0:fix_mtp_618_main
Open

bugfix: fix DeepSeek V4 schedule_overlap + mtp.#1782
JC-ut0 wants to merge 3 commits into
jd-opensource:mainfrom
JC-ut0:fix_mtp_618_main

Conversation

@JC-ut0

@JC-ut0 JC-ut0 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

This PR fixes a DeepSeek V4 precision regression when enable_schedule_overlap=true and MTP are enabled. The root cause was that Sequence::update_last_step_token() used num_kv_blocks()==0 to decide whether to skip last-step token replacement, which incorrectly treated composite KV layouts as KV-less sequences. The fix switches the guard to current_max_tokens_capacity()==0, so preempted sequences are still protected while composite KV sequences keep their accepted MTP tokens. The patch also adds a regression test covering the composite KV case and preserves full replacement of accepted MTP tokens in overlap mode.

Related Issues

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactor
  • Documentation
  • Test
  • Build or CI

Pull Request Checklist

Thank you for contributing to xLLM. Before requesting review, please make sure the following items are complete.

PR Title and Commit Messages

  • The PR title and each commit message follow the xLLM commit format: <type>: <subject>.

Allowed types: feat, bugfix, docs, test, refactor, chore, style, revert, perf, model, build, release.
The subject should use clear English, start with a verb, include at least 4 words, and end with ..

Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit or an equivalent command.
  • I have installed the hooks with pre-commit install.
  • I have run pre-commit run --all-files and fixed any reported issues.

If you are unsure how to set up pre-commit, see the pre-commit documentation.

Self Review

  • I have self-reviewed the code according to .agents/skills/code-review/references/custom-code-style.md, especially code written or assisted by AI.
  • I have rebased this PR onto the latest main branch.

Build and Test Coverage

  • Tests have been added or updated as needed.
  • CUDA: python setup.py build test has passed on a CUDA machine.
  • NPU: python setup.py build test has passed on an NPU machine.
  • MLU: python setup.py build test has passed on an MLU machine.

Reviewer Notes

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates Sequence::update_last_step_token to check current_max_tokens_capacity() instead of num_kv_blocks() when determining if a sequence's KV cache has been deallocated, accommodating composite KV managers that can have capacity without exposing local blocks. It also adds a corresponding unit test OverlapMTPReplacementKeepsCompositeKvBlocks to verify this behavior. Feedback was provided to annotate a constant argument with its parameter name in the new test code, in accordance with the repository style guide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/core/framework/batch/batch_test.cpp Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
DongheJin
DongheJin previously approved these changes Jun 22, 2026
yingxudeng
yingxudeng previously approved these changes Jun 22, 2026
@JC-ut0 JC-ut0 dismissed stale reviews from yingxudeng and DongheJin via 9a8e5eb June 24, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants