Skip to content

Commit be92334

Browse files
committed
[AMD] server_vllm.sh: default PREFILL/DECODE_TP_SIZE to a full node
Mirror server_sglang.sh / server_atom.sh so the bench.sh GPU count never resolves to 0 if submit.sh did not export the per-worker TP size.
1 parent 6c0e812 commit be92334

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

benchmarks/multi_node/amd_utils/server_vllm.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,12 @@ BENCH_MAX_CONCURRENCY="${BENCH_MAX_CONCURRENCY:-512}"
3939
DRY_RUN="${DRY_RUN:-0}"
4040
GPUS_PER_NODE="${GPUS_PER_NODE:-8}"
4141

42+
# Per-worker TP size (PREFILL_NODES*PREFILL_TP/PREFILL_WORKERS), normally exported
43+
# by submit.sh; fall back to a full node so the bench.sh GPU count never resolves
44+
# to 0. Mirrors server_sglang.sh / server_atom.sh.
45+
PREFILL_TP_SIZE="${PREFILL_TP_SIZE:-$GPUS_PER_NODE}"
46+
DECODE_TP_SIZE="${DECODE_TP_SIZE:-$GPUS_PER_NODE}"
47+
4248
ROUTER_PORT="${ROUTER_PORT:-30000}"
4349
SERVER_PORT="${SERVER_PORT:-2584}"
4450
ENGINE_ID="${ENGINE_ID:-${MODEL_NAME}-pd-run}"

0 commit comments

Comments
 (0)