[BugFix] call ops.top_k_per_row_prefill in int8 indexer prefill by Zymonody7 · Pull Request #283 · MetaX-MACA/vLLM-metax

Zymonody7 · 2026-06-10T12:44:06Z

torch.ops.top_k_per_row_prefill resolves to an op namespace, which is not callable, so the int8 sparse-attn-indexer prefill path raised TypeError. Match the bf16/fp8 paths and the decode path below, all of which use ops.top_k_per_row_prefill.

Purpose

Fix a crash in the int8 sparse-attn-indexer prefill path.

int8.py:183 calls torch.ops.top_k_per_row_prefill(...), which resolves to an
op namespace (not callable), so any model with FP8/int8 sparse attn indexer
(e.g. DeepSeek-V3.2) raises TypeError: '_OpNamespace' object is not callable
when the prefill path runs.

One-line fix: use ops.top_k_per_row_prefill(...) from vllm._custom_ops
(already imported at line 8), matching the bf16 path (bf16.py:181), the fp8
path (fp8.py:216) and the decode path in the same file (int8.py:254).

Test Plan

Minimal repro of the broken call form (no GPU needed):

python -c "import torch; torch.ops.top_k_per_row_prefill(1)"

grep -rE "torch\.ops\.[a-z_0-9]+\(" vllm_metax/ | grep -v "torch.ops._"

to confirm this is the only raw-namespace call site of this class.

Test Result

Before: TypeError: '_OpNamespace' object is not callable
After fix: int8 prefill resolves the same _C.top_k_per_row_prefill op as the bf16/fp8 paths.
grep: no remaining raw-namespace call sites.

(Optional) Documentation Update

None.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

torch.ops.top_k_per_row_prefill resolves to an op namespace, which is not callable, so the int8 sparse-attn-indexer prefill path raised TypeError. Match the bf16/fp8 paths and the decode path below, all of which use ops.top_k_per_row_prefill.

gemini-code-assist

Code Review

This pull request updates the sparse_attn_indexer_int8 function in vllm_metax/customized/layers/sparse_attn_indexer/int8.py to call ops.top_k_per_row_prefill directly instead of using torch.ops.top_k_per_row_prefill. There are no review comments, and I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] call ops.top_k_per_row_prefill in int8 indexer prefill#283

[BugFix] call ops.top_k_per_row_prefill in int8 indexer prefill#283
Zymonody7 wants to merge 1 commit into
MetaX-MACA:masterfrom
Zymonody7:fix/int8-indexer-topk-namespace

Zymonody7 commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Zymonody7 commented Jun 10, 2026

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant