Skip to content

Commit 5778199

Browse files
disagg #1762: sweep conc 1,2,4,8,16 at both 1k1k and 8k1k
Widen the disagg sweep from conc 1 to conc 1,2,4,8,16 for both seq-len scenarios (1P TP8 + 1D TP8). The 8k1k conc-16 point keeps the multi-node eval marked (eval-conc=16) so lm-eval still validates the MoRI-IO disagg pipeline. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 44ab389 commit 5778199

2 files changed

Lines changed: 6 additions & 6 deletions

File tree

.github/configs/amd-master.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3003,15 +3003,15 @@ minimaxm3-fp8-mi355x-vllm-disagg:
30033003
dp-attn: false
30043004
additional-settings:
30053005
- "DECODE_NODES=1"
3006-
# 8k1k conc-16 row (same 1P TP8 + 1D TP8 layout) exists so the multi-node
3007-
# eval policy (8k1k + conc >= MIN_EVAL_CONC=16) marks an lm-eval — validates
3008-
# the M3 MoRI-IO disagg pipeline's correctness end-to-end. The conc-1 1k1k
3009-
# row above stays the latency smoke test.
3006+
# 8k1k disagg sweep (same 1P TP8 + 1D TP8 layout) across conc 1,2,4,8,16. The
3007+
# conc-16 point also makes the multi-node eval policy (8k1k + conc >= 16) mark
3008+
# an lm-eval (eval-conc=16) — validating the M3 MoRI-IO disagg pipeline's
3009+
# correctness end-to-end.
30103010
- isl: 8192
30113011
osl: 1024
30123012
search-space:
30133013
- spec-decoding: "none"
3014-
conc-list: [ 16 ]
3014+
conc-list: [ 1, 2, 4, 8, 16 ]
30153015
prefill:
30163016
num-worker: 1
30173017
tp: 8

perf-changelog.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3850,5 +3850,5 @@
38503850
- "Layered on the MoRI-IO patch-removal infra (#1585): uses benchmarks/multi_node/amd_utils with the runtime MoRI patches removed"
38513851
- "Per-worker serve flags (models_vllm.yaml MiniMax-M3-MXFP8): --block-size 128 (MSA), --language-model-only, --kv-cache-dtype fp8, --attention-backend TRITON_ATTN, minimax_m3 parsers; no EP (TP8, MoE experts TP-sharded)"
38523852
- "M3 disagg script points MODEL_PATH at the cluster's shared HF cache (/it-share/hf-hub-cache) where the ~414 GB MiniMax-M3-MXFP8 checkpoint is pre-staged, instead of the launcher default /it-share/data; scoped to M3 only (other disagg models keep /it-share/data)"
3853-
- "Adds an 8k1k conc-16 row (same 1P TP8 + 1D TP8 layout) so the multi-node eval policy (8k1k + conc >= 16) marks an lm-eval, validating the disagg pipeline's correctness; the conc-1 1k1k row stays the latency smoke test (run with non-canary-full-sweep-enabled so the eval entry actually runs)"
3853+
- "Sweeps conc 1,2,4,8,16 at both 1k1k and 8k1k (1P TP8 + 1D TP8). The 8k1k conc-16 point makes the multi-node eval policy (8k1k + conc >= 16) mark an lm-eval (eval-conc=16), validating the disagg pipeline's correctness; run with non-canary-full-sweep-enabled so the eval entry actually runs"
38543854
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1762

0 commit comments

Comments
 (0)