[BugFix] correct topk_softmax/topk_sigmoid routing weights at (128, 8)#284
Open
Zymonody7 wants to merge 2 commits into
Open
[BugFix] correct topk_softmax/topk_sigmoid routing weights at (128, 8)#284Zymonody7 wants to merge 2 commits into
Zymonody7 wants to merge 2 commits into
Conversation
mcoplib _moe_C.topk_softmax returns biased weights when a correction bias is passed and _moe_C.topk_sigmoid returns softmax weights, both only at the (num_experts=128, topk=8) tile (issue MetaX-MACA#270). Expert ids are correct, so recompute the weights from gating_output at the selected ids and keep the kernel untouched. The extra gather runs on gating tensors only, negligible next to expert MLPs.
- Probe the kernel once at the first (128, 8) call and only recompute weights when it is actually wrong, so the workaround deactivates itself on fixed kernels (e.g. sigmoid on mcoplib >= 0.4.5) and on a future softmax fix. - Add tests/kernels/moe/test_topk_routing_128x8.py covering the patched ops.topk_softmax / ops.topk_sigmoid entry points across shapes, renormalize and bias.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
mcoplib _moe_C.topk_softmax returns biased weights when a correction bias is passed and _moe_C.topk_sigmoid returns softmax weights, both only at the (num_experts=128, topk=8) tile (issue #270). Expert ids are correct, so recompute the weights from gating_output at the selected ids and keep the kernel untouched. The extra gather runs on gating tensors only, negligible next to expert MLPs.
Purpose
Work around #270: mcoplib
_moe_C.topk_softmaxreturns biased weights when acorrection bias is passed, only at the
(num_experts=128, topk=8)tile. Expert idsare correct in all cases, so this patch recomputes the weights from
gating_outputat the kernel-selected ids via the existing
patch/bugfixmechanism. The kernel isuntouched; all other shapes take the original path; the extra gather runs on gating
tensors only.
The patch also covers
topk_sigmoidfor older kernels: sigmoid is broken onmcoplib 0.4.2 (per #270) but already correct on 0.4.5; the recompute is identical to
kernel output there, so it is harmless.
Test Plan
C500, MACA 3.7.1.5, mcoplib 0.4.5+maca3.7.0.37.torch2.8, shapes
E∈{64,128,256} × K∈{4,8} × {bias, no-bias} (repro script attached in #270 thread).
Test Result
Before: E=128 K=8 softmax bias=Y: max_diff = 3.03 ← broken (all other shapes ≤1e-7)
After: E=128 K=8 softmax bias=Y: max_diff = 0.0 (renormalize=False/True both)
Sanity: all other shapes unchanged ≤1e-7.
Also confirmed not present on mcoplib 0.3.1+maca3.3.0.15 (no bias arg there).
(Optional) Documentation Update
None.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.