Skip to content

[aw-failures] Failure Investigator pre-fetch returns empty failed_run_ids while in-window agentic failures exist — discovery bli [Content truncated due to length] #39037

@github-actions

Description

@github-actions

Recommendation

Fix the Failure Investigator's deterministic pre-fetch so its failed_run_ids/failures capture run-level failures, not just agent-job failures — it currently returns empty while real in-window agentic failures exist, so the investigator would call noop and miss them.

Problem statement

In the 2026-06-13 ~08:00Z run (6h window), /tmp/gh-aw/agent/failure-investigator/prefetch.json reported:

{ "failed_run_ids": [], "failures": [] }

...but an independent gh run list over the same window found two failed agentic workflow runs plus two CI failures. Had the agent trusted the pre-fetch alone (as step 0 instructs: "Only call additional logs/list APIs when a required field is missing or stale"), it would have reported no failures and missed a P1.

Evidence — in-window agentic failures the pre-fetch missed

Run Workflow Created Failing job Note
§27458229377 Daily Safe Outputs Git Simulator 05:47Z push_repo_memory agent job succeeded; post-agent job failed → run = failure (tracked by #39024)
§27458341448 Avenger 05:52Z agent transient network + test timeout; self-recovered next run

Probable root cause

The pre-fetch appears to key on the agent job conclusion rather than the run-level conclusion. That has two gaps:

  1. Runs where agent succeeds but a post-agent job (push_repo_memory, safe_outputs, conclusion) fails are dropped — exactly the Git Simulator case, which is also the highest-signal failure (a 5-day P1 streak).
  2. Runs that emit a valid noop/safe-output but still exit the agent job as failure (Avenger) may also be filtered.

Net effect: a run whose GitHub conclusion is failure is not guaranteed to appear in failed_run_ids. This silently narrows discovery and makes a noop result look authoritative when it is not.

Proposed remediation

  1. Base failed_run_ids on the workflow-run conclusion (failure | timed_out | startup_failure | cancelled) for agentic workflows in-window, independent of any single job's status.
  2. Annotate each entry with the failing job name(s) so clustering can distinguish agent-phase vs post-agent-phase (infra) failures.
  3. Keep the existing agent-job signal as an additional field, not the filter.

Success criteria / verification

  1. For a window containing a run like 27458229377 (agent OK, push_repo_memory failed), the pre-fetch's failed_run_ids includes that run.
  2. failed_run_ids count matches gh run list failure-conclusion agentic runs in the window.
  3. Each failure entry carries the failing job name(s).

References

Generated by 🔍 [aw] Failure Investigator (6h) · 233.2 AIC · ⌖ 14.2 AIC · ⊞ 5.1K ·

  • expires on Jun 20, 2026, 12:10 AM UTC-08:00

Resolution — verified fixed

Close this issue: the deterministic pre-fetch now keys on run-level conclusion and annotates failing job names, exactly as requested.

The 2026-06-13 13:17Z Failure Investigator run loaded /tmp/gh-aw/agent/failure-investigator/prefetch.json and it returned 20 failed_run_ids (not empty), each carrying the new failed_job_names and conclusion fields.

Success criteria — all met

  1. Run-level conclusion drives discovery — the payload now includes runs whose agent job did not itself fail. Example: Smoke CI runs §27466680204 and §27466673731 appear with failed_job_names = [activation, agent, pre_activation, push_repo_memory, safe_outputs] — i.e. post-agent/non-agent job failures are now captured (the Git-Simulator class from this report).
  2. Failure entries carry failing job name(s) — every entry has a populated failed_job_names array.
  3. Agent-job signal retained as an additional fieldagent_job_conclusion is present alongside the run-level conclusion, not used as the filter.

The original symptom (failed_run_ids: [] while in-window failures existed) does not reproduce. Closing as fixed; reopen if a future window shows a run with a failure/timed_out/startup_failure conclusion missing from the pre-fetch.

Verified by run §27467779429.

Generated by 🔍 [aw] Failure Investigator (6h) · 191.6 AIC · ⌖ 12.9 AIC · ⊞ 5.1K ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions