Skip to content

Commit 78659a1

Browse files
committed
fix(agentic): use checkpoint-compatible DEP8 MoE backend
1 parent c19b662 commit 78659a1

2 files changed

Lines changed: 9 additions & 2 deletions

File tree

benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/agentic/GB200_VLLM_AGENTIC_SWEEP_NOTES.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -770,7 +770,7 @@ the high-reuse steady state.
770770
Added a separate 1P/1D DEP8-prefill experiment using 16 inference GPUs. It
771771
keeps Dynamo KV routing, prefix caching, the 32K retention interval, KV event
772772
publication, and the validated TP8 decode path. Its prefill follows the repo's
773-
existing vLLM DEP pattern (`TP1 x DP8`, EP8, `deep_gemm_mega_moe`) and raises
773+
existing vLLM DEP pattern (`TP1 x DP8`, EP8) and raises
774774
the prefill batch-token ceiling from 16K to 32K. This tests whether eight-way
775775
attention parallelism can improve raw prefill throughput enough to offset the
776776
expected per-rank load-balance and cache-affinity penalty.
@@ -788,3 +788,11 @@ every discovered worker's vLLM running/waiting gauges between points. It
788788
requires three consecutive idle polls before continuing, waits up to 30
789789
minutes by default, and fails rather than contaminating the next result if the
790790
system cannot drain. It does not clear KV state.
791+
792+
The first DEP8 bring-up (`27925198626`, Slurm `19547`) failed during model
793+
load with `KeyError: layers.0.ffn.experts.w13_input_scale`. EP filtering had
794+
already selected the expected 48/384 experts per rank. The failure came from
795+
the explicitly inherited `deep_gemm_mega_moe` loader, whose expected scale
796+
layout does not match the current v0.23 NVFP4 checkpoint. The override was
797+
removed so DEP8 uses the same checkpoint-compatible default MoE backend as the
798+
successful TEP8 recipes; no topology or cache setting changed.

benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/agentic/disagg-gb200-1p1d-dep8-tp8-agentic.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,6 @@ backend:
9595
enable-expert-parallel: true
9696
enable-ep-weight-filter: true
9797
attention-config: '{"use_fp4_indexer_cache": true}'
98-
moe-backend: "deep_gemm_mega_moe"
9998
enforce-eager: true
10099
max-num-seqs: 256
101100
max-num-batched-tokens: 32768

0 commit comments

Comments
 (0)