Skip to content

Commit f5713be

Browse files
aryguptclaude
andcommitted
chore: re-arm dsr1 powercheck — validate perfmon teardown fix (srt-slurm@b9526e5)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent cd680da commit f5713be

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

perf-changelog.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4052,3 +4052,4 @@
40524052
- "Exercises the runner-side wiring added to runners/launch_gb300-nv.sh: the dsr1 branch clones SemiAnalysisAI/srt-slurm@feat/inferencex-perfmon (NVIDIA/srt-slurm PR #35) instead of upstream sa-submission, recursively injects `monitoring:` into every recipes/<hw>/<seq>/*.yaml (find -type f, never a flat glob — the flat glob is what silently produced 0 power rows in sweep #26548110246), and stages the per-node perf_samples_*.csv to $GITHUB_WORKSPACE before `rm -rf outputs`, setting GPU_METRICS_CSV_GLOB for the Process-result step."
40534053
- "Success criteria: job green AND the agg JSON patched with avg_power_w + per-stage prefill_avg_power_w/decode_avg_power_w + workers[] (role-labelled prefill/decode) from utils/aggregate_power.py. If those fields are absent the plumbing is not yet proven and the full dsr1-disagg-NVIDIA sweep stays gated. Remove this key (or keep as a GB300 power canary) once validated."
40544054
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1791
4055+
# re-arm 2026-06-22b: validate perfmon teardown fix (srt-slurm@b9526e5) — probe for pre-existing CG job

0 commit comments

Comments
 (0)