[codex] Add golden AL distributions #147

Workflow file for this run

.github/workflows/codeowner-signoff-verify.yml at 586b963

	name: CODEOWNER Sign-off Verify

	# Independently re-verifies a CODEOWNER reviewer sign-off.
	#
	# A reviewer marks a PR ready by posting the sign-off checklist that starts with
	# "As a PR reviewer and CODEOWNER" (see e.g.
	# https://github.com/SemiAnalysisAI/InferenceX/pull/1792#issuecomment-4781799476).
	# That sign-off can be posted three different ways, and all three trigger here:
	# - a conversation comment -> issue_comment
	# - an Approve/Comment review summary -> pull_request_review
	# - an inline code-review comment -> pull_request_review_comment
	#
	# When the sign-off lands on an open PR, this workflow asks Claude to
	# independently confirm the claims that actually gate a merge:
	# 0. The sign-off author is a CODEOWNER for the files the PR changes.
	# 1. PR validation has at least one fully-green run on a commit in the PR.
	# 2. Evals pass on that same green run.
	# 3. The single-node recipe(s) are linked in the sign-off AND the linked
	# recipe PR/commit contains all of the reproduction information that is in
	# this InferenceX PR.
	# If any of those are not to standard, Claude posts a single comment that
	# @-mentions the reviewer who signed off and explains exactly what is wrong.
	#
	# It can also be run manually via workflow_dispatch by passing the URL of the
	# sign-off comment/review to verify.

	on:
	issue_comment:
	types: [created, edited]
	pull_request_review:
	types: [submitted, edited]
	pull_request_review_comment:
	types: [created, edited]
	workflow_dispatch:
	inputs:
	comment_url:
	description: >-
	URL of the sign-off comment/review to verify, e.g.
	https://github.com/OWNER/REPO/pull/123#issuecomment-456 (also accepts
	#pullrequestreview-<id> and #discussion_r<id> review URLs).
	required: true
	type: string

	concurrency:
	group: codeowner-signoff-verify-${{ github.event.issue.number \|\| github.event.pull_request.number \|\| github.run_id }}
	cancel-in-progress: false

	jobs:
	# ---------------------------------------------------------------------------
	# Gate: only proceed for the sign-off checklist on an OPEN, non-draft PR.
	# Normalizes the three event shapes (issue_comment / pull_request_review /
	# pull_request_review_comment) into a single set of outputs, resolves an
	# immutable head SHA (TOCTOU-safe), and hands the verifier a ready-to-run
	# command for fetching the exact sign-off body.
	# ---------------------------------------------------------------------------
	gate:
	if: \|
	github.event_name == 'workflow_dispatch' \|\|
	(github.event_name == 'issue_comment' && github.event.issue.pull_request &&
	contains(github.event.comment.body \|\| '', 'As a PR reviewer and CODEOWNER')) \|\|
	(github.event_name == 'pull_request_review' &&
	contains(github.event.review.body \|\| '', 'As a PR reviewer and CODEOWNER')) \|\|
	(github.event_name == 'pull_request_review_comment' &&
	contains(github.event.comment.body \|\| '', 'As a PR reviewer and CODEOWNER'))
	runs-on: ubuntu-latest
	permissions:
	contents: read
	pull-requests: read
	outputs:
	proceed: ${{ steps.resolve.outputs.proceed }}
	pr-number: ${{ steps.resolve.outputs.pr-number }}
	head-sha: ${{ steps.resolve.outputs.head-sha }}
	signoff-author: ${{ steps.resolve.outputs.signoff-author }}
	signoff-kind: ${{ steps.resolve.outputs.signoff-kind }}
	signoff-fetch-cmd: ${{ steps.resolve.outputs.signoff-fetch-cmd }}
	steps:
	- name: Resolve PR state and sign-off metadata
	id: resolve
	uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
	with:
	script: \|
	const owner = context.repo.owner;
	const repo = context.repo.repo;
	const repoFull = `${owner}/${repo}`;
	const eventName = context.eventName;

	// Normalize the event payloads (3 comment events + manual dispatch).
	let prNumber, author, refId, kind, fetchCmd;
	if (eventName === 'issue_comment') {
	prNumber = context.payload.issue.number;
	author = context.payload.comment.user.login;
	refId = context.payload.comment.id;
	kind = 'conversation comment';
	fetchCmd = `gh api repos/${repoFull}/issues/comments/${refId} --jq .body`;
	} else if (eventName === 'pull_request_review') {
	prNumber = context.payload.pull_request.number;
	author = context.payload.review.user.login;
	refId = context.payload.review.id;
	kind = 'review summary';
	fetchCmd = `gh api repos/${repoFull}/pulls/${prNumber}/reviews/${refId} --jq .body`;
	} else if (eventName === 'pull_request_review_comment') {
	prNumber = context.payload.pull_request.number;
	author = context.payload.comment.user.login;
	refId = context.payload.comment.id;
	kind = 'inline review comment';
	fetchCmd = `gh api repos/${repoFull}/pulls/comments/${refId} --jq .body`;
	} else if (eventName === 'workflow_dispatch') {
	// Parse a sign-off URL like
	// https://github.com/OWNER/REPO/pull/123#issuecomment-456
	// .../pull/123#pullrequestreview-456
	// .../pull/123#discussion_r456
	const url = (context.payload.inputs && context.payload.inputs.comment_url) \|\| '';
	const prMatch = url.match(/\/pull\/(\d+)/);
	if (!prMatch) {
	core.setOutput('proceed', 'false');
	core.setFailed(`comment_url must contain /pull/<number>: ${url}`);
	return;
	}
	prNumber = parseInt(prMatch[1], 10);
	let m;
	if ((m = url.match(/#issuecomment-(\d+)/))) {
	refId = m[1];
	kind = 'conversation comment';
	fetchCmd = `gh api repos/${repoFull}/issues/comments/${refId} --jq .body`;
	const c = await github.rest.issues.getComment({ owner, repo, comment_id: parseInt(refId, 10) });
	author = c.data.user.login;
	} else if ((m = url.match(/#pullrequestreview-(\d+)/))) {
	refId = m[1];
	kind = 'review summary';
	fetchCmd = `gh api repos/${repoFull}/pulls/${prNumber}/reviews/${refId} --jq .body`;
	const r = await github.rest.pulls.getReview({ owner, repo, pull_number: prNumber, review_id: parseInt(refId, 10) });
	author = r.data.user.login;
	} else if ((m = url.match(/#discussion_r(\d+)/))) {
	refId = m[1];
	kind = 'inline review comment';
	fetchCmd = `gh api repos/${repoFull}/pulls/comments/${refId} --jq .body`;
	const rc = await github.rest.pulls.getReviewComment({ owner, repo, comment_id: parseInt(refId, 10) });
	author = rc.data.user.login;
	} else {
	core.setOutput('proceed', 'false');
	core.setFailed(`comment_url must end with #issuecomment-<id>, #pullrequestreview-<id>, or #discussion_r<id>: ${url}`);
	return;
	}
	} else {
	core.setOutput('proceed', 'false');
	return;
	}

	const { data: pr } = await github.rest.pulls.get({
	owner, repo, pull_number: prNumber,
	});

	// Event-triggered: only act on open, non-draft ("ready") PRs.
	// Manual dispatch is an explicit override, so allow any PR state.
	const ready = eventName === 'workflow_dispatch'
	? true
	: (pr.state === 'open' && pr.draft === false);
	if (!ready) {
	core.info(`PR #${prNumber} is not ready (state=${pr.state}, draft=${pr.draft}); skipping.`);
	}

	core.setOutput('proceed', ready ? 'true' : 'false');
	core.setOutput('pr-number', String(prNumber));
	core.setOutput('head-sha', pr.head.sha);
	core.setOutput('signoff-author', author);
	core.setOutput('signoff-kind', kind);
	core.setOutput('signoff-fetch-cmd', fetchCmd);

	verify:
	needs: gate
	if: ${{ needs.gate.outputs.proceed == 'true' }}
	runs-on: ubuntu-latest
	permissions:
	contents: read
	pull-requests: write
	issues: write
	actions: read
	steps:
	# SECURITY: this is a privileged job (write perms + secrets) triggered by
	# comment events, so it must NOT check out untrusted PR head code — the
	# setup step and the MCP server would then execute attacker-controlled
	# files. Check out the trusted default branch; all PR content is read
	# read-only via the GitHub API (gh / MCP), never from the working tree.
	- name: Checkout repository (trusted default branch only)
	uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
	with:
	fetch-depth: 0
	ref: ${{ github.event.repository.default_branch }}
	token: ${{ secrets.CLAUDE_PAT }}

	- name: Setup MCP Server
	run: \|
	pip3 install -r .claude/requirements-mcp.txt
	mkdir -p /tmp/inferencemax-mcp

	- name: Verify sign-off with Claude
	uses: anthropics/claude-code-action@v1
	env:
	GH_TOKEN: ${{ secrets.CLAUDE_PAT }}
	GITHUB_TOKEN: ${{ secrets.CLAUDE_PAT }}
	INFERENCEMAX_ROOT: ${{ github.workspace }}
	BASH_DEFAULT_TIMEOUT_MS: "1800000"
	BASH_MAX_TIMEOUT_MS: "3600000"
	with:
	anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
	github_token: ${{ secrets.CLAUDE_PAT }}
	track_progress: false
	allowed_bots: ''
	additional_permissions: \|
	actions: read
	settings: \|
	{"fastMode": true}

	claude_args: \|
	--model 'claude-opus-4-8'
	--mcp-config '{"mcpServers": {"fetch": {"command": "npx", "args": ["-y", "@anthropic-ai/mcp-server-fetch@latest"]}, "inferencemax-repos": {"command": "python3", "args": ["${{ github.workspace }}/.claude/mcp/server.py"], "env": {"INFERENCEMAX_ROOT": "${{ github.workspace }}"}}}}'
	--allowedTools "Read,Glob,Grep,WebFetch,mcp__github__,mcp__github_ci__,mcp__fetch__,mcp__inferencemax-repos__,Bash"
	prompt: \|
	REPO: ${{ github.repository }}
	PR NUMBER: ${{ needs.gate.outputs.pr-number }}
	PR HEAD SHA: ${{ needs.gate.outputs.head-sha }}
	SIGN-OFF AUTHOR: ${{ needs.gate.outputs.signoff-author }}
	SIGN-OFF KIND: ${{ needs.gate.outputs.signoff-kind }}

	You are an automated merge-gate auditor for InferenceX.

	A CODEOWNER (`${{ needs.gate.outputs.signoff-author }}`) just posted the reviewer
	sign-off checklist (as a ${{ needs.gate.outputs.signoff-kind }}) that marks
	PR #${{ needs.gate.outputs.pr-number }} as ready to merge. Your job is to
	INDEPENDENTLY verify the checks below (0-3). Do not trust the reviewer's checkmarks
	— re-derive every conclusion from CODEOWNERS, CI runs, the PR diff, and the linked
	recipe yourself. Be rigorous and specific.

	Read the exact sign-off body first (especially its "Additional detail section").
	The sign-off was posted as a ${{ needs.gate.outputs.signoff-kind }}, so fetch it with:
	```bash
	${{ needs.gate.outputs.signoff-fetch-cmd }}
	```

	Get the PR metadata and the full diff (this is the "InferenceX PR recipe" you will
	compare against later):
	```bash
	gh pr view ${{ needs.gate.outputs.pr-number }} --repo ${{ github.repository }} --json title,headRefName,headRefOid,files,body
	gh pr diff ${{ needs.gate.outputs.pr-number }} --repo ${{ github.repository }}
	```
	Anchor everything to the pinned head SHA `${{ needs.gate.outputs.head-sha }}` (the
	commit that was signed off). First confirm the PR tip has not moved since the gate
	ran: if `headRefOid` from the command above differs from the pinned SHA, the head
	advanced mid-verification — assess the recipe at the PINNED SHA (e.g.
	`gh api repos/${{ github.repository }}/commits/${{ needs.gate.outputs.head-sha }}` and
	the files at that SHA), and note in your comment that a fresh sign-off is needed for
	the new commit. This keeps Check 3 (recipe) consistent with Checks 1-2.

	## Check 0 — The sign-off author is a CODEOWNER for the changed files
	The sign-off must come from a CODEOWNER for what the PR changes. Read
	`.github/CODEOWNERS` (in the checked-out default branch) and, for each changed file,
	find its owners via last-matching-pattern-wins (the LAST matching line wins; owners do
	not accumulate, so `.github/configs/nvidia-master.yaml @a @b` overrides `* @org/team`).
	Then decide:
	- A path whose most-specific owner is a SPECIFIC line (named users/team): the signer
	`@${{ needs.gate.outputs.signoff-author }}` must be one of those owners (listed
	directly, or a member of that team).
	- A path whose ONLY owner is the broad catch-all (`* @org/team`): satisfied by ANY
	recognized CODEOWNER. So if the signer is listed anywhere in CODEOWNERS (e.g. they
	own one of the specific files in this PR), the catch-all paths are covered too —
	the catch-all is a default, not a per-file gatekeeper.
	- Be decisive and DO NOT hard-fail on unreadable team membership. The bot's token
	often can't read org team membership (403/404) — that is not a failure. If the
	signer is clearly a CODEOWNER (listed in CODEOWNERS for any changed path, or for a
	specific changed file), treat Check 0 as PASS. Do not write "if they're a member
	then OK, otherwise…" hedging.
	- FAIL only on a genuine mismatch: the signer is not a CODEOWNER for a SPECIFIC
	(non-catch-all) changed path — e.g. an AMD owner signing an NVIDIA-only change.
	Name the path and its real owners in one line.

	## Check 1 — A passing sweep + evals ran on a commit IN this PR
	The merge standard (and InferenceX's own reuse gate) requires a green full sweep —
	including evals — on a commit that is CURRENTLY part of this PR. A sweep that ran on a
	commit later rebased/force-pushed out does NOT count: at merge, `merge_with_reuse.sh`
	→ `validate_reusable_run` (in `utils/find_reusable_sweep_run.py`) rejects any source
	whose `head_sha` is not in `GET /pulls/<n>/commits`. So the whole question collapses
	to one fact: does a commit still in this PR carry green, executed sweep/eval checks?

	The cleanest way to answer is the check-runs attached to each in-PR commit — you do
	NOT need to list `run-sweep.yml` runs or parse reuse logs.
	- Get the PR's current commit SHAs:
	```bash
	gh api repos/${{ github.repository }}/pulls/${{ needs.gate.outputs.pr-number }}/commits \
	--paginate --jq '.[].sha'
	```
	- For each of those SHAs, list its check-runs and look at the sweep/eval jobs:
	```bash
	gh api repos/${{ github.repository }}/commits/<sha>/check-runs --paginate \
	--jq '.check_runs[] \| {name, conclusion}'
	```
	The per-config benchmark/eval check-runs are named like `single-node 1k1k /`,
	`single-node 8k1k /`, and `eval /`. A commit satisfies validation only if the actual
	executed jobs — the `eval /` AND `single-node */` check-runs — have conclusion
	`success`. `skipped` does NOT count (it means a skip/reuse run that executed nothing
	on that commit); `failure` does not count.
	IMPORTANT: do NOT rely on `collect-evals` — it is an aggregator that can report
	`success` even when every underlying `eval /` job was `skipped` (it just aggregated
	an empty set). Always key off the per-config `eval /` and `single-node */` jobs.
	- PASS if ANY commit currently in the PR has green (success, non-skipped) `single-node */`
	AND `eval /` check-runs. Remember the run id behind a passing `eval /` check (its
	`details_url` contains `/actions/runs/<run_id>/...`) — Check 2 needs it.
	- Otherwise FAIL. State the ROOT ISSUE plainly and keep it actionable:
	"No passing sweep/eval was found on any commit in this PR."
	Do NOT write a confusing message like "it technically passed but the commit isn't in
	the PR" — a sweep that ran on a rebased-out commit is irrelevant to the reviewer, so
	don't lead with it. The fix the author needs is simply: run (or re-anchor via
	`/reuse-sweep-run`) a passing full sweep on a commit currently in this PR. You may
	add an offending run/SHA as a short supporting detail AFTER the root-issue line.

	## Check 2 — Evals actually pass (accuracy), on that in-PR commit's run
	For the commit that passed Check 1, confirm the eval numbers are real and meet the bar
	— not just that the job is green:
	- Take the run id behind the passing `eval /` / `collect-evals` check-run (from its
	`details_url`) and download its eval results:
	```bash
	gh run download <RUN_ID> --repo ${{ github.repository }} -p 'eval_results_*' -D ./evals \|\| \
	gh run download <RUN_ID> --repo ${{ github.repository }} -p 'eval_*' -D ./evals
	find ./evals -name '*.json' \| head
	```
	- Read the aggregated eval JSON / the run's "Eval Summary" step summary and confirm
	accuracy is present and meets the expected bar for the model, and that the run used
	the same inference-engine image as this PR's config. FAIL if evals are
	skipped/failed/empty/below bar, or the image differs — say exactly which.

	## Check 3 — Recipe linked AND complete
	The InferenceX "recipe" for this PR = the files it changes under `benchmarks/`
	(especially `benchmarks/single_node/*/.sh`) plus its entry in
	`.github/configs/*-master.yaml`. The merge standard is: the community must be able to
	reproduce this benchmark from a public recipe.
	- (a) LINK PRESENT: The sign-off's "Additional detail section" MUST contain a link to
	the corresponding recipe — a PR/commit in
	`https://github.com/vllm-project/recipes` or
	`https://github.com/sgl-project/sglang` (cookbook under `docs_new`), or the
	published recipe page (`https://recipes.vllm.ai/` or
	`https://docs.sglang.io/cookbook/...`). If no such link is present, FAIL.
	- (b) MAJOR SERVER ARGS MATCH: Fetch the linked recipe (use the `fetch` MCP tool or
	WebFetch; for a recipe PR, read its diff via `gh pr diff` against that repo if
	accessible) and compare it to this PR's launch command. The recipe only needs to
	match the MAJOR, deployment-defining server args — NOT every flag, and explicitly
	NOT the knobs that are specific to InferenceX benchmark/harness tuning.
	MAJOR (must match — these define the model, parallelism, precision, and which
	kernels run, so they determine the perf profile):
	- model / model-path, hardware/SKU
	- parallelism: TP / EP / DP / PP and DP-attention flags
	(`--enable-dp-attention`, `--enable-dp-lm-head`, etc.)
	- quantization and kv-cache dtype
	- kernel-selection backends: `--attention-backend`, `--moe-runner-backend`,
	`--enable-flashinfer-allreduce-fusion` and similar
	- other flags that materially change the served model or its throughput
	INFERENCEX-SPECIFIC (do NOT require a match — list as informational only, never a
	failure): per-lane sweep tuning and harness plumbing such as
	`--scheduler-recv-interval`, `--chunked-prefill-size`, `--disable-piecewise-cuda-graph`,
	`SGLANG_RADIX_FORCE_MISS` and similar env toggles, concurrency / sequence-length
	sweep ranges, ports, result filenames, and image tag/version.
	FAIL only if a MAJOR arg in this PR is missing from (or contradicts) the recipe;
	list exactly those. Treat the InferenceX-specific diffs as expected and mention them
	only as a brief informational note, not as blockers. If a flag's effect is
	equivalent to a recipe default (e.g. quantization auto-detected from an FP4 model),
	say so and do not count it against the recipe.
	- Note: a bare "recipes are already similar to the official ones" claim WITHOUT a
	link does not pass this workflow's standard — a link is required.

	## Verdict and output
	Decide PASS only if Checks 0, 1, 2, and 3 ALL pass. Post EXACTLY ONE summary comment on
	PR #${{ needs.gate.outputs.pr-number }} using `gh pr comment`. Start the comment with
	the hidden marker so reruns are identifiable:
	`<!-- codeowner-signoff-verify sha=${{ needs.gate.outputs.head-sha }} -->`

	Before posting, list the PR's comments and, if a prior verification comment with this
	marker already exists for THIS head SHA, do not post a duplicate — update your
	assessment only if the conclusion changed.

	KEEP IT TIGHT — a busy reviewer should get it in ~15 seconds. Not a novel, not a
	single terse line either. Rules:
	- First line after the marker: the overall verdict (PASS, or one line naming what
	blocks merge).
	- Then ONE short line per check. For a passing check write `PASS — <brief reason>`;
	spend words only on the checks that fail.
	- State conclusions, don't narrate your process. No multi-paragraph explanations, no
	restating the checklist, no hedging ("if X then maybe Y" — make the call). Link the
	run/recipe instead of describing it.

	- If everything is to standard: post the verdict line + the four one-line PASS rows
	(with the green run URL). Do NOT @-mention anyone on a pass.
	- If anything is NOT to standard: the FIRST line (after the marker) must @-mention the
	sign-off author as `@${{ needs.gate.outputs.signoff-author }}` with the blocking
	summary. Then the per-check lines, each failing one led by its root issue (e.g.
	"No passing sweep/eval on any commit in this PR") with the supporting link after.

	Do not use emojis anywhere in the comment. Use only facts you verified. If a required
	artifact or run is inaccessible, say so explicitly rather than assuming pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] Add golden AL distributions #147

Workflow file

[codex] Add golden AL distributions #147

Uh oh!

Workflow file for this run