disagg #1762: sweep conc 1,2,4,8,16 at both 1k1k and 8k1k

functionstackx · claude · functionstackx · commit 5778199f50c8 · 2026-06-15T01:29:28.000-04:00
Widen the disagg sweep from conc 1 to conc 1,2,4,8,16 for both seq-len scenarios
(1P TP8 + 1D TP8). The 8k1k conc-16 point keeps the multi-node eval marked
(eval-conc=16) so lm-eval still validates the MoRI-IO disagg pipeline.

Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/.github/configs/amd-master.yaml b/.github/configs/amd-master.yaml
@@ -3003,15 +3003,15 @@ minimaxm3-fp8-mi355x-vllm-disagg:
           dp-attn: false
           additional-settings:
           - "DECODE_NODES=1"
-    # 8k1k conc-16 row (same 1P TP8 + 1D TP8 layout) exists so the multi-node
-    # eval policy (8k1k + conc >= MIN_EVAL_CONC=16) marks an lm-eval — validates
-    # the M3 MoRI-IO disagg pipeline's correctness end-to-end. The conc-1 1k1k
-    # row above stays the latency smoke test.
+    # 8k1k disagg sweep (same 1P TP8 + 1D TP8 layout) across conc 1,2,4,8,16. The
+    # conc-16 point also makes the multi-node eval policy (8k1k + conc >= 16) mark
+    # an lm-eval (eval-conc=16) — validating the M3 MoRI-IO disagg pipeline's
+    # correctness end-to-end.
     - isl: 8192
       osl: 1024
       search-space:
       - spec-decoding: "none"
-        conc-list: [ 16 ]
+        conc-list: [ 1, 2, 4, 8, 16 ]
         prefill:
           num-worker: 1
           tp: 8
diff --git a/perf-changelog.yaml b/perf-changelog.yaml
@@ -3850,5 +3850,5 @@
     - "Layered on the MoRI-IO patch-removal infra (#1585): uses benchmarks/multi_node/amd_utils with the runtime MoRI patches removed"
     - "Per-worker serve flags (models_vllm.yaml MiniMax-M3-MXFP8): --block-size 128 (MSA), --language-model-only, --kv-cache-dtype fp8, --attention-backend TRITON_ATTN, minimax_m3 parsers; no EP (TP8, MoE experts TP-sharded)"
     - "M3 disagg script points MODEL_PATH at the cluster's shared HF cache (/it-share/hf-hub-cache) where the ~414 GB MiniMax-M3-MXFP8 checkpoint is pre-staged, instead of the launcher default /it-share/data; scoped to M3 only (other disagg models keep /it-share/data)"
-    - "Adds an 8k1k conc-16 row (same 1P TP8 + 1D TP8 layout) so the multi-node eval policy (8k1k + conc >= 16) marks an lm-eval, validating the disagg pipeline's correctness; the conc-1 1k1k row stays the latency smoke test (run with non-canary-full-sweep-enabled so the eval entry actually runs)"
+    - "Sweeps conc 1,2,4,8,16 at both 1k1k and 8k1k (1P TP8 + 1D TP8). The 8k1k conc-16 point makes the multi-node eval policy (8k1k + conc >= 16) mark an lm-eval (eval-conc=16), validating the disagg pipeline's correctness; run with non-canary-full-sweep-enabled so the eval entry actually runs"
   pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1762