Skip to content

Commit da052a6

Browse files
committed
perf(agentic): probe GB200 frontier tails
1 parent 14f86cd commit da052a6

2 files changed

Lines changed: 39 additions & 0 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12827,6 +12827,21 @@ dsv4-fp4-gb200-dynamo-vllm-agentic-3p2d-tep8-tp8:
1282712827
agentic-coding:
1282812828
- duration: 1800
1282912829
search-space:
12830+
# Ultra-high-interactivity probes below the historical c16 endpoint.
12831+
- spec-decoding: none
12832+
conc-list: [4, 8]
12833+
prefill:
12834+
num-worker: 3
12835+
tp: 8
12836+
ep: 8
12837+
dp-attn: false
12838+
additional-settings:
12839+
- "CONFIG_FILE=recipes/vllm/deepseek-v4/agentic/disagg-gb200-3p2d-tep8-tp8-agentic.yaml"
12840+
decode:
12841+
num-worker: 2
12842+
tp: 8
12843+
ep: 1
12844+
dp-attn: false
1283012845
- spec-decoding: none
1283112846
conc-list: [16, 24, 32, 40]
1283212847
prefill:
@@ -12901,6 +12916,21 @@ dsv4-fp4-gb200-dynamo-vllm-agentic-2p1d-dep8-dep8:
1290112916
tp: 8
1290212917
ep: 8
1290312918
dp-attn: true
12919+
# Exploratory tail beyond the measured c160 normalized-throughput peak.
12920+
- spec-decoding: none
12921+
conc-list: [192, 224, 256]
12922+
prefill:
12923+
num-worker: 2
12924+
tp: 8
12925+
ep: 8
12926+
dp-attn: true
12927+
additional-settings:
12928+
- "CONFIG_FILE=recipes/vllm/deepseek-v4/agentic/disagg-gb200-2p1d-dep8-dep8-agentic.yaml"
12929+
decode:
12930+
num-worker: 1
12931+
tp: 8
12932+
ep: 8
12933+
dp-attn: true
1290412934

1290512935
# Split from dsr1-fp4-b200-dynamo-trt: agentic-coding scenario only.
1290612936
dsr1-fp4-b200-dynamo-trt-agentic:

benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/agentic/GB200_VLLM_AGENTIC_SWEEP_NOTES.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1348,3 +1348,12 @@ The AIPerf submodule also merges `ajcasagrande/aiperf:ajc/agentx` at
13481348
resulting merge commit retains both histories and supplies the latest AgentX
13491349
warmup, timing, Weka-loader, metrics, Dynamo-session, and process-lifecycle
13501350
fixes.
1351+
1352+
## Exploratory Frontier Tails
1353+
1354+
Five boundary probes extend the validated core sweep. The 3P/2D TEP8/TP8
1355+
topology adds c4/c8 below its historical c16 endpoint to measure the
1356+
ultra-high-interactivity limit. The 2P/1D DEP8/DEP8 topology adds
1357+
c192/c224/c256 beyond the current c160 normalized-throughput leader to locate
1358+
the actual one-decode saturation knee. These are isolated batches so they do
1359+
not force reruns of already selected points when a boundary probe overloads.

0 commit comments

Comments
 (0)