Recommendation
Fix the gpt-5-codex experiment arm — it 404s 100% of the time because Codex CLI 0.137.0 requests gpt-5-codex-alpha-2025-11-07, a model the api-proxy does not serve. Pin the arm to a provisioned codex model id (or drop the arm) so the Daily Cache Strategy Analyzer stops hard-failing whenever the experiment selects this variant.
Problem statement
The Daily Cache Strategy Analyzer runs an A/B experiment over model_size variants [gpt-5.4, gpt-5-codex]; every run that picks the gpt-5-codex arm fails after exhausting all retries.
Affected workflow and run IDs
- Workflow:
.github/workflows/daily-cache-strategy-analyzer.md (engine codex, model ${{ needs.activation.outputs.model_size }}, experiment arm gpt-5-codex)
- Representative failed run: §27475817126 — step "Execute Codex CLI", all 4 attempts failed (
exitCode=1 totalDuration=1m 7s)
Root cause
The proxy allowlist maps gpt-5-codex → copilot/gpt-5*codex* / openai/gpt-5*codex*, but Codex CLI 0.137.0 internally resolves the arm to the concrete model id gpt-5-codex-alpha-2025-11-07, which the api-proxy (172.30.0.30:10000/responses) rejects with 404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07. The harness fallback re-runs with --model gpt-5-codex, but the CLI still emits the alpha id and 404s again — so retries cannot recover.
Evidence (api-proxy 404, all retries exhausted)
{"type":"error","message":"unexpected status 404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07, url: (172.30.0.30/redacted) ..."}
{"type":"turn.failed","error":{"message":"... Model not found gpt-5-codex-alpha-2025-11-07 ..."}}
WARN codex_models_manager::model_info: Unknown model gpt-5-codex is used. This will use fallback model metadata.
[codex-harness] all 3 retries exhausted — giving up (exitCode=1)
Note: Smoke Codex's agent step succeeds in the same window, so the proxy/codex path is healthy for provisioned ids — only the gpt-5-codex→alpha resolution is broken.
Proposed remediation (pick one)
- Explicitly set the model id the CLI sends to a provisioned codex model (e.g.
-c model=<available-id>) instead of relying on the CLI's gpt-5-codex→alpha default.
- Provision
gpt-5-codex-alpha-2025-11-07 upstream and add it to the api-proxy model allowlist/pricing table.
- Pin/upgrade the Codex CLI to a version whose
gpt-5-codex alias resolves to a served model.
- Drop the
gpt-5-codex arm from the experiment model_size variants until the model is available.
Success criteria / verification
- No
404 ... Model not found gpt-5-codex-alpha-2025-11-07 entries in api-proxy logs for codex runs.
- Daily Cache Strategy Analyzer
gpt-5-codex arm run_success_rate >= 0.90 (the experiment's own guardrail), or the arm is removed.
6h Failure Investigation — full window context (2026-06-13 19:13Z, 31 runs)
Failure clusters
| Cluster |
Workflow(s) |
Run IDs |
Class |
Tracking |
Codex gpt-5-codex→alpha 404 |
Daily Cache Strategy Analyzer |
27475817126 |
config/model · P1 |
this issue |
| Claude log-parser guardrail false-fail |
Avenger ×4 |
27473035084, 27471579514, 27470367219, 27468988707 |
bug · P1 |
#39141 |
upload_artifact 400 → safe_outputs fail |
Smoke Copilot/Codex/Claude, Design Decision Gate, PR Sous Chef |
27471858644, 27471836485, 27471836462, 27471832454, 27471203716 |
bug · P1 |
#38998 (open, recurring) |
| Daily-AIC guardrail false-failure |
PR Code Quality Reviewer ×6 |
27474598971, 27471832435, 27471799544, 27471709386, 27469570698, 27468597131 |
noise (by-design) |
#39079, #39077 (open) |
| Credit-limit test (intentional) |
Daily Credit Limit Test |
27467921501 |
expected |
n/a |
Copilot CLI: node missing in AWF chroot (exit 127) |
Daily Issues Report Generator ×2 |
27472163434, 27470398413 |
infra · P2 |
roadmap |
| Copilot CLI tool-denial threshold |
Daily Formal Spec Verifier |
27471596105 |
bug |
FIXED by f7fb96b / #39101 (run predates fix) |
| Daily SPDD Spec Planner exec fail |
Daily SPDD Spec Planner |
27472157373 |
unverified |
roadmap |
| Doc build: Git LFS pointer not hydrated |
Documentation Unbloat |
27473025009 |
infra · P2 |
roadmap |
| Smoke Gemini: model has no AI-credits pricing |
Smoke Gemini |
27471836499 |
config · P2 |
roadmap |
PR-queue GraphQL 502 Bad Gateway |
PR Sous Chef ×2 |
27475734213, 27473979242 |
transient |
none |
Release: Defender sig update hr=0x80070652 |
Release |
27469593101 |
transient (external) |
none |
| Cancelled / superseded |
Smoke CI ×4, Q |
27475351300, 27475289369, 27468719491, 27467752372, 27471149654 |
not a failure |
n/a |
Existing-issue correlation
Fix roadmap
References: §27475817126 · §27473035084 · §27471858644
Recommendation
Fix the
gpt-5-codexexperiment arm — it 404s 100% of the time because Codex CLI 0.137.0 requestsgpt-5-codex-alpha-2025-11-07, a model the api-proxy does not serve. Pin the arm to a provisioned codex model id (or drop the arm) so the Daily Cache Strategy Analyzer stops hard-failing whenever the experiment selects this variant.Problem statement
The Daily Cache Strategy Analyzer runs an A/B experiment over
model_sizevariants[gpt-5.4, gpt-5-codex]; every run that picks thegpt-5-codexarm fails after exhausting all retries.Affected workflow and run IDs
.github/workflows/daily-cache-strategy-analyzer.md(enginecodex, model${{ needs.activation.outputs.model_size }}, experiment armgpt-5-codex)exitCode=1 totalDuration=1m 7s)Root cause
The proxy allowlist maps
gpt-5-codex→copilot/gpt-5*codex*/openai/gpt-5*codex*, but Codex CLI 0.137.0 internally resolves the arm to the concrete model idgpt-5-codex-alpha-2025-11-07, which the api-proxy (172.30.0.30:10000/responses) rejects with404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07. The harness fallback re-runs with--model gpt-5-codex, but the CLI still emits the alpha id and 404s again — so retries cannot recover.Evidence (api-proxy 404, all retries exhausted)
Note: Smoke Codex's agent step succeeds in the same window, so the proxy/codex path is healthy for provisioned ids — only the
gpt-5-codex→alpha resolution is broken.Proposed remediation (pick one)
-c model=<available-id>) instead of relying on the CLI'sgpt-5-codex→alpha default.gpt-5-codex-alpha-2025-11-07upstream and add it to the api-proxy model allowlist/pricing table.gpt-5-codexalias resolves to a served model.gpt-5-codexarm from the experimentmodel_sizevariants until the model is available.Success criteria / verification
404 ... Model not found gpt-5-codex-alpha-2025-11-07entries in api-proxy logs for codex runs.gpt-5-codexarmrun_success_rate >= 0.90(the experiment's own guardrail), or the arm is removed.6h Failure Investigation — full window context (2026-06-13 19:13Z, 31 runs)
Failure clusters
gpt-5-codex→alpha 404upload_artifact400 →safe_outputsfailnodemissing in AWF chroot (exit 127)502 Bad Gatewayhr=0x80070652Existing-issue correlation
failure(core.setFailed) — false-failure noise + auto-issue [Content truncated due to length] #39079 / [perf-improvement] AIC Budget Crisis Day 5 — 6-agent cluster expanding, root fix urgently needed #39077 (AIC daily-guardrail false-failure + budget crisis) — accounted for 6 of the 31 "failures" (PR Code Quality Reviewer). Non-actionable noise, already tracked. Keep open.Fix roadmap
gpt-5-codexarm 404 (this issue); Claude log-parser false-failures ([aw-failures] P1: Claude log-parser guardrail fails SUCCESSFUL runs when stdio has no JSON entries — Avenger 4× false-failures/6 [Content truncated due to length] #39141).nodemissing in AWF chroot for Daily Issues Report Generator (exit 127, ×2); Documentation Unbloat checkout missinglfs: true(PDF slide is an unhydrated LFS pointer); Smoke Geminigemini-3.1-flash-tts-previewhas no AI-credits pricing → add to proxy pricing table or setapiProxy.defaultAiCreditsPricing.References: §27475817126 · §27473035084 · §27471858644