Recommendation
Fix the Failure Investigator's deterministic pre-fetch so its failed_run_ids/failures capture run-level failures, not just agent-job failures — it currently returns empty while real in-window agentic failures exist, so the investigator would call noop and miss them.
Problem statement
In the 2026-06-13 ~08:00Z run (6h window), /tmp/gh-aw/agent/failure-investigator/prefetch.json reported:
{ "failed_run_ids": [], "failures": [] }
...but an independent gh run list over the same window found two failed agentic workflow runs plus two CI failures. Had the agent trusted the pre-fetch alone (as step 0 instructs: "Only call additional logs/list APIs when a required field is missing or stale"), it would have reported no failures and missed a P1.
Evidence — in-window agentic failures the pre-fetch missed
| Run |
Workflow |
Created |
Failing job |
Note |
| §27458229377 |
Daily Safe Outputs Git Simulator |
05:47Z |
push_repo_memory |
agent job succeeded; post-agent job failed → run = failure (tracked by #39024) |
| §27458341448 |
Avenger |
05:52Z |
agent |
transient network + test timeout; self-recovered next run |
Probable root cause
The pre-fetch appears to key on the agent job conclusion rather than the run-level conclusion. That has two gaps:
- Runs where
agent succeeds but a post-agent job (push_repo_memory, safe_outputs, conclusion) fails are dropped — exactly the Git Simulator case, which is also the highest-signal failure (a 5-day P1 streak).
- Runs that emit a valid
noop/safe-output but still exit the agent job as failure (Avenger) may also be filtered.
Net effect: a run whose GitHub conclusion is failure is not guaranteed to appear in failed_run_ids. This silently narrows discovery and makes a noop result look authoritative when it is not.
Proposed remediation
- Base
failed_run_ids on the workflow-run conclusion (failure | timed_out | startup_failure | cancelled) for agentic workflows in-window, independent of any single job's status.
- Annotate each entry with the failing job name(s) so clustering can distinguish agent-phase vs post-agent-phase (infra) failures.
- Keep the existing agent-job signal as an additional field, not the filter.
Success criteria / verification
- For a window containing a run like 27458229377 (agent OK,
push_repo_memory failed), the pre-fetch's failed_run_ids includes that run.
failed_run_ids count matches gh run list failure-conclusion agentic runs in the window.
- Each failure entry carries the failing job name(s).
References
Generated by 🔍 [aw] Failure Investigator (6h) · 233.2 AIC · ⌖ 14.2 AIC · ⊞ 5.1K · ◷
Resolution — verified fixed
Close this issue: the deterministic pre-fetch now keys on run-level conclusion and annotates failing job names, exactly as requested.
The 2026-06-13 13:17Z Failure Investigator run loaded /tmp/gh-aw/agent/failure-investigator/prefetch.json and it returned 20 failed_run_ids (not empty), each carrying the new failed_job_names and conclusion fields.
Success criteria — all met
- Run-level conclusion drives discovery — the payload now includes runs whose
agent job did not itself fail. Example: Smoke CI runs §27466680204 and §27466673731 appear with failed_job_names = [activation, agent, pre_activation, push_repo_memory, safe_outputs] — i.e. post-agent/non-agent job failures are now captured (the Git-Simulator class from this report).
- Failure entries carry failing job name(s) — every entry has a populated
failed_job_names array.
- Agent-job signal retained as an additional field —
agent_job_conclusion is present alongside the run-level conclusion, not used as the filter.
The original symptom (failed_run_ids: [] while in-window failures existed) does not reproduce. Closing as fixed; reopen if a future window shows a run with a failure/timed_out/startup_failure conclusion missing from the pre-fetch.
Verified by run §27467779429.
Generated by 🔍 [aw] Failure Investigator (6h) · 191.6 AIC · ⌖ 12.9 AIC · ⊞ 5.1K · ◷
Recommendation
Fix the Failure Investigator's deterministic pre-fetch so its
failed_run_ids/failurescapture run-level failures, not justagent-job failures — it currently returns empty while real in-window agentic failures exist, so the investigator would callnoopand miss them.Problem statement
In the 2026-06-13 ~08:00Z run (6h window),
/tmp/gh-aw/agent/failure-investigator/prefetch.jsonreported:{ "failed_run_ids": [], "failures": [] }...but an independent
gh run listover the same window found two failed agentic workflow runs plus two CI failures. Had the agent trusted the pre-fetch alone (as step 0 instructs: "Only call additional logs/list APIs when a required field is missing or stale"), it would have reported no failures and missed a P1.Evidence — in-window agentic failures the pre-fetch missed
push_repo_memoryagentjob succeeded; post-agent job failed → run =failure(tracked by #39024)agentProbable root cause
The pre-fetch appears to key on the
agentjob conclusion rather than the run-level conclusion. That has two gaps:agentsucceeds but a post-agent job (push_repo_memory,safe_outputs,conclusion) fails are dropped — exactly the Git Simulator case, which is also the highest-signal failure (a 5-day P1 streak).noop/safe-output but still exit theagentjob asfailure(Avenger) may also be filtered.Net effect: a run whose GitHub conclusion is
failureis not guaranteed to appear infailed_run_ids. This silently narrows discovery and makes anoopresult look authoritative when it is not.Proposed remediation
failed_run_idson the workflow-runconclusion(failure | timed_out | startup_failure | cancelled) for agentic workflows in-window, independent of any single job's status.Success criteria / verification
push_repo_memoryfailed), the pre-fetch'sfailed_run_idsincludes that run.failed_run_idscount matchesgh run listfailure-conclusion agentic runs in the window.References
Resolution — verified fixed
Close this issue: the deterministic pre-fetch now keys on run-level conclusion and annotates failing job names, exactly as requested.
The 2026-06-13 13:17Z Failure Investigator run loaded
/tmp/gh-aw/agent/failure-investigator/prefetch.jsonand it returned 20failed_run_ids(not empty), each carrying the newfailed_job_namesandconclusionfields.Success criteria — all met
agentjob did not itself fail. Example: Smoke CI runs §27466680204 and §27466673731 appear withfailed_job_names=[activation, agent, pre_activation, push_repo_memory, safe_outputs]— i.e. post-agent/non-agent job failures are now captured (the Git-Simulator class from this report).failed_job_namesarray.agent_job_conclusionis present alongside the run-levelconclusion, not used as the filter.The original symptom (
failed_run_ids: []while in-window failures existed) does not reproduce. Closing as fixed; reopen if a future window shows a run with afailure/timed_out/startup_failureconclusion missing from the pre-fetch.Verified by run §27467779429.