Genesis 6582#371
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for 4D query tensors (with sequence lengths up to 4) in the sliding window attention (SWA) paged decode kernel, updating the Triton kernel, PyTorch operator, and associated tests. The review feedback highlights critical issues in the Triton kernel implementation, specifically the incorrect reuse of query strides instead of output strides when calculating output pointers (which can cause memory corruption if the query is non-contiguous), and a potential division-by-zero risk when the sum of probabilities is zero.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Claude Code ReviewVerdict: Request changes -- Output strides incorrectly reuse query strides, which will corrupt output for any non-contiguous or differently-shaped output tensor. SummaryExtends Must fix
SuggestionsSuggestions (3)
NitsNits (2)
Notes
|
Claude Code ReviewVerdict: Request changes -- The kernel passes Q strides for the output tensor, which will produce wrong results whenever Q and O have different layouts. SummaryAdds multi-token (Q_SEQLEN > 1) support to the SWA paged decode kernel and reference implementation, generalizing query shape from Must fix
SuggestionsSuggestions (4)
NitsNits (2)
Notes
|
Claude Code ReviewVerdict: Request changes -- Output strides are passed as Q strides in the kernel call, which will corrupt writes whenever Q and O have different layouts. SummaryAdds support for a small Q sequence length dimension (up to 4) in the SWA paged decode path, both in the Triton kernel and the reference operator. Tests are extended to cover both 3D and 4D query shapes. Must fix
SuggestionsSuggestions (4)
NitsNits (3)
Notes
|
Claude Code ReviewVerdict: Request changes -- The kernel does not handle the q-seqlen dimension within the attention computation (no per-token masking against kv positions), which will produce incorrect results for seq_len > 1. SummaryAdds support for a query seq_len dimension (up to 4) to the SWA paged decode kernel and reference implementation, with an updated test matrix. The kernel reshapes Q to Must fix
SuggestionsSuggestions (3)
NitsNits (3)
Notes
|
注意这个问题,decode N个token应当产生[N, kv_len]的causal mask,而非[1, kv_len] |
Claude Code ReviewVerdict: Request changes -- MLU kernel masking and reshape logic for the new MTP path looks incorrect; please verify. SummaryAdds support for 4D (MTP) query input Must fix
SuggestionsSuggestions (3)
NitsNits (3)
Notes
|
Claude Code ReviewVerdict: Request changes -- N-step SWA decode kernel has incorrect output strides for the 3D query path and a redundant double-where in the safe-divide. SummaryAdds an n-step variant of paged SWA decode (query shape Must fix
SuggestionsSuggestions (4)
NitsNits (4)
Notes
|
Claude Code ReviewVerdict: Request changes -- The kernel's stride/shape handling for the n-step path has correctness concerns and the assertion SummaryAdds an n-step paged-decode SWA path: a new Must fix
SuggestionsSuggestions (4)
NitsNits (3)
Notes
|
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This reverts commit dbea3c8.
Claude Code ReviewVerdict: Request changes -- N-step SWA decode kernel has correctness bugs in masking and a misplaced division-by-zero guard, plus a hard-coded seq_len limit and typos. SummaryAdds a new Must fix
SuggestionsSuggestions (4)
NitsNits (3)
Notes
|
Claude Code ReviewVerdict: Request changes -- N-step SWA decode kernel has incorrect mask arithmetic in the non-global window branch and a few correctness issues that will produce wrong outputs. SummaryAdds an n-step variant of paged decode SWA that accepts a 4D query Must fix
SuggestionsSuggestions (5)
NitsNits (3)
Notes
|
Claude Code ReviewVerdict: Request changes -- N-step SWA decode kernel has a mask-layout bug and a typo that will produce wrong results for GQA. SummaryAdds an n-step paged-decode SWA path: a new Must fix
SuggestionsSuggestions (4)
NitsNits (3)
Notes
|
Claude Code ReviewVerdict: Request changes -- N-step SWA decode kernel has a likely-incorrect Q reshape that will scramble per-token/per-head Q rows and break correctness. SummaryAdds an n-step (multi-token) variant of paged decode SWA: a new Must fix
SuggestionsSuggestions (4)
NitsNits (4)
Notes
|
No description provided.