Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2149,13 +2149,13 @@ dsv4-fp4-mi355x-sglang:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, dp-attn: true, conc-start: 64, conc-end: 2048 }
- { tp: 8, dp-attn: false, conc-start: 1 , conc-end: 32 }
- { tp: 4, dp-attn: true, conc-start: 64, conc-end: 2048 }
- { tp: 4, dp-attn: false, conc-start: 1 , conc-end: 32 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, dp-attn: true, conc-start: 64, conc-end: 2048 }
- { tp: 8, dp-attn: false, conc-start: 1, conc-end: 32 }
- { tp: 4, dp-attn: true, conc-list: 32, conc-end: 2048 }
Comment thread
cursor[bot] marked this conversation as resolved.
Outdated
- { tp: 4, dp-attn: false, conc-start: 1, conc-end: 32 }

# MTP variant of dsv4-fp4-mi355x-sglang. Mirrors the base search space and adds
# spec-decoding: mtp, which routes to dsv4_fp4_mi355x_sglang_mtp.sh (EAGLE
Expand Down
7 changes: 7 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3822,3 +3822,10 @@
description:
- "Extend MiniMax-M3 MXFP8 H100/H200 non-MTP sweeps to concurrency 1 on the latency rows (H100: TP8; H200: TP4 and TP8) and add full TEP coverage from conc 1 to 256 (H100: TP8+EP8; H200: TP4+EP4 and TP8+EP8, incl. a new TP4+EP4 row for 8k1k). H200 TP8+EP8 upper bound moves 512->256 (high concurrency stays covered by the TP8+EP8 dp-attn DEP rows). DEP rows unchanged"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1761

- config-keys:
- dsv4-fp4-mi355x-sglang
description:
- "Switch fixed-seq-len search space from TP8 to TP4 for both isl=1024 and isl=8192 scenarios"
- "Expand isl=8192 coverage: add TP4 dp-attn sweep (conc 32–2048) and TP4 TP-only sweep (conc 1–32)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1762