bugfix: fix DeepSeek V4 schedule_overlap + mtp.#1782
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates Sequence::update_last_step_token to check current_max_tokens_capacity() instead of num_kv_blocks() when determining if a sequence's KV cache has been deallocated, accommodating composite KV managers that can have capacity without exposing local blocks. It also adds a corresponding unit test OverlapMTPReplacementKeepsCompositeKvBlocks to verify this behavior. Feedback was provided to annotate a constant argument with its parameter name in the new test code, in accordance with the repository style guide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Description
This PR fixes a DeepSeek V4 precision regression when
enable_schedule_overlap=trueand MTP are enabled. The root cause was thatSequence::update_last_step_token()usednum_kv_blocks()==0to decide whether to skip last-step token replacement, which incorrectly treated composite KV layouts as KV-less sequences. The fix switches the guard tocurrent_max_tokens_capacity()==0, so preempted sequences are still protected while composite KV sequences keep their accepted MTP tokens. The patch also adds a regression test covering the composite KV case and preserves full replacement of accepted MTP tokens in overlap mode.Related Issues
Change Type
Pull Request Checklist
Thank you for contributing to xLLM. Before requesting review, please make sure the following items are complete.
PR Title and Commit Messages
<type>: <subject>.Pre-commit Checks
pre-commitby runningpip install pre-commitor an equivalent command.pre-commit install.pre-commit run --all-filesand fixed any reported issues.Self Review
.agents/skills/code-review/references/custom-code-style.md, especially code written or assisted by AI.mainbranch.Build and Test Coverage
python setup.py build testhas passed on a CUDA machine.python setup.py build testhas passed on an NPU machine.python setup.py build testhas passed on an MLU machine.Reviewer Notes