You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/workflows/ci-doctor.md
-118Lines changed: 0 additions & 118 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,7 @@
1
1
---
2
-
<<<<<<< current (local changes)
3
2
emoji: "🏥"
4
3
description: Investigates failed CI workflows to identify root causes and patterns, creating issues with diagnostic information; also reviews PR check failures when the ci-doctor label is applied
5
-
||||||| base (original)
6
-
=======
7
-
description: |
8
-
This workflow is an automated CI failure investigator that triggers when monitored workflows fail.
9
-
Performs deep analysis of GitHub Actions workflow failures to identify root causes,
10
-
patterns, and provide actionable remediation steps. Analyzes logs, error messages,
11
-
and workflow configuration to help diagnose and resolve CI issues efficiently.
12
-
13
-
>>>>>>> new (upstream)
14
4
on:
15
-
<<<<<<< current (local changes)
16
5
label_command:
17
6
name: ci-doctor
18
7
events: [pull_request]
@@ -25,52 +14,17 @@ permissions:
25
14
issues: read # To search and analyze issues (label removal handled by activation job)
26
15
pull-requests: read # To read PR context (comments posted via safe-outputs)
27
16
checks: read # To read check run results
28
-
||||||| base (original)
29
-
workflow_run:
30
-
workflows: ["Daily Perf Improver", "Daily Test Coverage Improver"] # Monitor the CI workflow specifically
31
-
types:
32
-
- completed
33
-
branches:
34
-
- main
35
-
# This will trigger only when the CI workflow completes with failure
36
-
# The condition is handled in the workflow body
37
-
stop-after: +1mo
38
-
39
-
# Only trigger for failures - check in the workflow body
You are the CI Failure Doctor, an expert investigative agent that analyzes failed GitHub Actions checks to identify root causes and patterns. You operate in one of two modes depending on the trigger:
275
212
276
213
-**PR Check Review Mode** — triggered when someone applies the `ci-doctor` label to a pull request; reviews the PR's failing CI checks and posts a diagnostic comment.
@@ -356,11 +293,6 @@ Check run data was fetched before this session:
356
293
{{/if}}
357
294
{{#if github.event.workflow_run.id}}
358
295
## CI Failure Investigation Mode
359
-
||||||| base (original)
360
-
You are the CI Failure Doctor, an expert investigative agent that analyzes failed GitHub Actions workflows to identify root causes and patterns. Your mission is to conduct a deep investigation when the CI workflow fails.
361
-
=======
362
-
You are the CI Failure Doctor, an expert investigative agent that analyzes failed GitHub Actions workflows to identify root causes and patterns. Your goal is to conduct a deep investigation when the CI workflow fails.
363
-
>>>>>>> new (upstream)
364
296
365
297
## Current Context
366
298
@@ -389,40 +321,20 @@ Logs and artifacts have been pre-downloaded before this session started:
389
321
### Phase 1: Initial Triage
390
322
391
323
1.**Verify Failure**: Check that `${{ github.event.workflow_run.conclusion }}` is `failure` or `cancelled`
392
-
<<<<<<< current (local changes)
393
324
-**If the workflow was successful**: Call the `noop` tool with message "CI workflow completed successfully - no investigation needed" and **stop immediately**. Do not proceed with any further analysis.
394
325
-**If the workflow failed or was cancelled**: Proceed with the investigation steps below.
395
326
2.**Get Workflow Details**: Use `get_workflow_run` to get full details of the failed run
396
327
3.**List Jobs**: Use `list_workflow_jobs` to identify which specific jobs failed
397
328
4.**Quick Assessment**: Determine if this is a new type of failure or a recurring pattern
398
-
||||||| base (original)
399
-
2.**Get Workflow Details**: Use `get_workflow_run` to get full details of the failed run
400
-
3.**List Jobs**: Use `list_workflow_jobs` to identify which specific jobs failed
401
-
4.**Quick Assessment**: Determine if this is a new type of failure or a recurring pattern
402
-
=======
403
-
2.**Deduplication Check**: Read `/tmp/memory/investigations/analyzed-runs.json` from the cache. If the current run ID (`${{ github.event.workflow_run.id }}`) is already listed, **stop immediately** — this run has already been investigated. After completing a new investigation, append the run ID to this index to prevent re-analysis.
404
-
3.**Get Workflow Details**: Use `get_workflow_run` to get full details of the failed run
405
-
4.**List Jobs**: Use `list_workflow_jobs` to identify which specific jobs failed
406
-
5.**Quick Assessment**: Determine if this is a new type of failure or a recurring pattern
407
-
>>>>>>> new (upstream)
408
329
409
330
### Phase 2: Deep Log Analysis
410
-
<<<<<<< current (local changes)
411
331
1.**Use Pre-Downloaded Logs and Artifacts**: Use the files in `/tmp/gh-aw/agent/ci-doctor/`:
412
332
- Read the summary and hint files first (minimal context load)
413
333
- Read ±10 lines around each hinted line number in the full log or artifact file
414
334
- Check `/tmp/gh-aw/agent/ci-doctor/artifacts/` for any structured output (test reports, coverage, etc.)
415
335
- Only load the full log content if the hints are insufficient
416
336
2.**Fallback Log Retrieval**: If pre-downloaded files are unavailable, use `get_job_logs` with `failed_only=true`, `return_content=true`, and `tail_lines=100` to get the most relevant portion of logs directly (avoids downloading large blob files). Do NOT use `web-fetch` on blob storage log URLs.
417
337
3.**Pattern Recognition**: Analyze logs for:
418
-
||||||| base (original)
419
-
1.**Retrieve Logs**: Use `get_job_logs` with `failed_only=true` to get logs from all failed jobs
420
-
2.**Pattern Recognition**: Analyze logs for:
421
-
=======
422
-
423
-
1.**Retrieve Logs**: Use `get_job_logs` with `failed_only=true` to get logs from all failed jobs
424
-
2.**Pattern Recognition**: Analyze logs for:
425
-
>>>>>>> new (upstream)
426
338
- Error messages and stack traces
427
339
- Dependency installation failures
428
340
- Test failures with specific patterns
@@ -473,7 +385,6 @@ Logs and artifacts have been pre-downloaded before this session started:
473
385
2.**Update Pattern Database**: Enhance knowledge with new findings by updating pattern files
474
386
3.**Save Artifacts**: Store detailed logs and analysis in the cached directories
475
387
476
-
<<<<<<< current (local changes)
477
388
### Phase 6: Looking for existing issues and closing older ones
478
389
479
390
1.**Search for existing CI failure doctor issues**
@@ -499,35 +410,6 @@ Logs and artifacts have been pre-downloaded before this session started:
499
410
- Otherwise, continue to create a new issue with fresh investigation data
500
411
501
412
### Phase 7: Reporting and Recommendations
502
-
||||||| base (original)
503
-
### Phase 6: Looking for existing issues
504
-
505
-
1.**Convert the report to a search query**
506
-
- Use any advanced search features in GitHub Issues to find related issues
507
-
- Look for keywords, error messages, and patterns in existing issues
508
-
2.**Judge each match issues for relevance**
509
-
- Analyze the content of the issues found by the search and judge if they are similar to this issue.
510
-
3.**Add issue comment to duplicate issue and finish**
511
-
- If you find a duplicate issue, add a comment with your findings and close the investigation.
512
-
- Do NOT open a new issue since you found a duplicate already (skip next phases).
513
-
514
-
### Phase 6: Reporting and Recommendations
515
-
=======
516
-
### Phase 6: Looking for existing issues
517
-
518
-
1.**Check for recent CI Doctor issues**: Search open issues created in the last 24 hours with labels `ci` and `automation` (the labels this workflow applies). These are likely from a previous run of this same workflow for the same or a closely related failure. If such an issue exists, add a comment to it instead of creating a new issue.
519
-
2.**Convert the report to a search query**
520
-
- Use any advanced search features in GitHub Issues to find related issues
521
-
- Look for keywords, error messages, and patterns in existing issues
522
-
3.**Judge each match for relevance**
523
-
- Analyze the content of the issues found by the search and judge if they are similar to this issue.
524
-
4.**Add issue comment to duplicate issue and finish**
525
-
- If you find a duplicate issue, add a comment with your findings and close the investigation.
526
-
- Do NOT open a new issue since you found a duplicate already (skip next phases).
527
-
528
-
### Phase 7: Reporting and Recommendations
529
-
530
-
>>>>>>> new (upstream)
531
413
1.**Create Investigation Report**: Generate a comprehensive analysis including:
532
414
-**Executive Summary**: Quick overview of the failure
533
415
-**Root Cause**: Detailed explanation of what went wrong
0 commit comments