Skip to content

[BugFix] call ops.top_k_per_row_prefill in int8 indexer prefill#283

Open
Zymonody7 wants to merge 1 commit into
MetaX-MACA:masterfrom
Zymonody7:fix/int8-indexer-topk-namespace
Open

[BugFix] call ops.top_k_per_row_prefill in int8 indexer prefill#283
Zymonody7 wants to merge 1 commit into
MetaX-MACA:masterfrom
Zymonody7:fix/int8-indexer-topk-namespace

Conversation

@Zymonody7

Copy link
Copy Markdown

torch.ops.top_k_per_row_prefill resolves to an op namespace, which is not callable, so the int8 sparse-attn-indexer prefill path raised TypeError. Match the bf16/fp8 paths and the decode path below, all of which use ops.top_k_per_row_prefill.

Purpose

Fix a crash in the int8 sparse-attn-indexer prefill path.

int8.py:183 calls torch.ops.top_k_per_row_prefill(...), which resolves to an
op namespace (not callable), so any model with FP8/int8 sparse attn indexer
(e.g. DeepSeek-V3.2) raises TypeError: '_OpNamespace' object is not callable
when the prefill path runs.

One-line fix: use ops.top_k_per_row_prefill(...) from vllm._custom_ops
(already imported at line 8), matching the bf16 path (bf16.py:181), the fp8
path (fp8.py:216) and the decode path in the same file (int8.py:254).

Test Plan

Minimal repro of the broken call form (no GPU needed):

python -c "import torch; torch.ops.top_k_per_row_prefill(1)"
grep -rE "torch\.ops\.[a-z_0-9]+\(" vllm_metax/ | grep -v "torch.ops._"

to confirm this is the only raw-namespace call site of this class.

Test Result

  • Before: TypeError: '_OpNamespace' object is not callable
  • After fix: int8 prefill resolves the same _C.top_k_per_row_prefill op as the bf16/fp8 paths.
  • grep: no remaining raw-namespace call sites.

(Optional) Documentation Update

None.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

torch.ops.top_k_per_row_prefill resolves to an op namespace, which is
not callable, so the int8 sparse-attn-indexer prefill path raised
TypeError. Match the bf16/fp8 paths and the decode path below, all of
which use ops.top_k_per_row_prefill.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sparse_attn_indexer_int8 function in vllm_metax/customized/layers/sparse_attn_indexer/int8.py to call ops.top_k_per_row_prefill directly instead of using torch.ops.top_k_per_row_prefill. There are no review comments, and I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant