Skip to content

[BugFix] correct topk_softmax/topk_sigmoid routing weights at (128, 8)#284

Open
Zymonody7 wants to merge 2 commits into
MetaX-MACA:masterfrom
Zymonody7:fix/moe-topk-128x8-routing-weights
Open

[BugFix] correct topk_softmax/topk_sigmoid routing weights at (128, 8)#284
Zymonody7 wants to merge 2 commits into
MetaX-MACA:masterfrom
Zymonody7:fix/moe-topk-128x8-routing-weights

Conversation

@Zymonody7

Copy link
Copy Markdown

mcoplib _moe_C.topk_softmax returns biased weights when a correction bias is passed and _moe_C.topk_sigmoid returns softmax weights, both only at the (num_experts=128, topk=8) tile (issue #270). Expert ids are correct, so recompute the weights from gating_output at the selected ids and keep the kernel untouched. The extra gather runs on gating tensors only, negligible next to expert MLPs.

Purpose

Work around #270: mcoplib _moe_C.topk_softmax returns biased weights when a
correction bias is passed, only at the (num_experts=128, topk=8) tile. Expert ids
are correct in all cases, so this patch recomputes the weights from gating_output
at the kernel-selected ids via the existing patch/bugfix mechanism. The kernel is
untouched; all other shapes take the original path; the extra gather runs on gating
tensors only.

The patch also covers topk_sigmoid for older kernels: sigmoid is broken on
mcoplib 0.4.2 (per #270) but already correct on 0.4.5; the recompute is identical to
kernel output there, so it is harmless.

Test Plan

C500, MACA 3.7.1.5, mcoplib 0.4.5+maca3.7.0.37.torch2.8, shapes
E∈{64,128,256} × K∈{4,8} × {bias, no-bias} (repro script attached in #270 thread).

Test Result

Before: E=128 K=8 softmax bias=Y: max_diff = 3.03 ← broken (all other shapes ≤1e-7)
After: E=128 K=8 softmax bias=Y: max_diff = 0.0 (renormalize=False/True both)
Sanity: all other shapes unchanged ≤1e-7.

Also confirmed not present on mcoplib 0.3.1+maca3.3.0.15 (no bias arg there).

(Optional) Documentation Update

None.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

mcoplib _moe_C.topk_softmax returns biased weights when a correction
bias is passed and _moe_C.topk_sigmoid returns softmax weights, both
only at the (num_experts=128, topk=8) tile (issue MetaX-MACA#270). Expert ids are
correct, so recompute the weights from gating_output at the selected
ids and keep the kernel untouched. The extra gather runs on gating
tensors only, negligible next to expert MLPs.
- Probe the kernel once at the first (128, 8) call and only recompute
  weights when it is actually wrong, so the workaround deactivates
  itself on fixed kernels (e.g. sigmoid on mcoplib >= 0.4.5) and on a
  future softmax fix.
- Add tests/kernels/moe/test_topk_routing_128x8.py covering the patched
  ops.topk_softmax / ops.topk_sigmoid entry points across shapes,
  renormalize and bias.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant