Agent Performance Report — Week of 2026-05-29 #35716

2026-05-29T13:52:09Z

github-actions[bot]
Bot May 29, 2026

Executive Summary

Agents analyzed: 14 active workflow groups (~236 total workflows)
Total outputs reviewed: 81+ issues created by agents this week
Quality score: 72/100 (↓2 from 74 last week)
Effectiveness score: 68/100 (↓4 from 72 last week)
Ecosystem health: 76/100 (↓6 from 82 — two sharp new regressions detected)
Top performers: copilot-swe-agent, spec-enforcer, Copilot cloud agent
Critical regressions this week: Agentic Commands (80%→25%), Content Moderation (75%→22%)
Dominant failure mode: under-creation (6 of 14 agents producing zero output)

Performance Rankings

Top Performing Agents 🏆

spec-enforcer (Quality: 85/100, Effectiveness: 88/100)
- 90% success rate, 100% PR merge rate
- Clean behavioral profile — no anomalies detected
- Consistent throughput across all recent runs
copilot-swe-agent (Quality: 82/100, Effectiveness: 80/100)
- 67% merge rate across 15 runs, 8 PRs created this period
- Active on remove-yield-feature — building on previous campaign work
- Healthy throughput, no under/over-creation patterns
Copilot cloud agent (Quality: 80/100, Effectiveness: 80/100)
- 100% success rate (2/2 runs) — too early to score definitively
- Clean profile; strong early signal

Agents With Sharp Regressions 📉

Agentic Commands (was: 80% success → now: 25%, 3/12 runs)
- ⚠️ New regression this week — was flagged as "most stable PR-triggered workflow" last run
- Behavioral pattern: inconsistency
- Likely correlated with CJS shard 4 CI blocker (Step Name Alignment P1, filed 2026-05-29)
- Recommended action: Audit trigger conditions and check if new CI failures are blocking runs
Content Moderation (was: 75% success → now: 22%, 2/9 runs)
- ⚠️ New regression this week — was in Top Performers last run
- Behavioral pattern: inconsistency
- No clear single root cause identified; possible safe_outputs or AI engine dependency
- Recommended action: Review last 3 failure logs for common error signature

Persistently Failing Agents 🚫

Copilot CLI workflows (0% success, 10 runs, 5+ consecutive days)
- Tracked under [aw] Copilot CLI Deep Research Agent failed #35388 — platform-level; infra team required
- Pattern: under-creation (externally blocked)
- Do not assign campaigns to this cluster until [aw] Copilot CLI Deep Research Agent failed #35388 resolved
Q (0% success, 11 runs)
- Pattern: under-creation
- Silent-skip behavior — no failure logs, no outputs
- Previously flagged in silent-skip cluster audit
AI Moderator (0% success, 9 runs)
- Pattern: under-creation
- Same silent-skip signature as Q cluster
PR Sous Chef (0% success, 5 runs)
- Pattern: under-creation — hard blocked by P0 [aw-failures] Failure Investigator (6h) — 4 failures, safe_outputs add_comment validation breaks 3 workflows (2026-05-27 19:24 → 2026-05-28 01 [Content truncated due to length] #35351
- Will remain at 0% until safe_outputs add_comment validation is resolved

Agents With Behavioral Problems ⚠️

failure-reporters (~70% success, 10 runs)
- Patterns: over-creation + repetition
- Creating ~20 issues/day with 60% duplicate rate — active noise in issue tracker
- Dedup gate has been recommended for multiple consecutive runs with no action yet
Daily Safe Output Tool Optimizer (~50% success, 4 runs)
- Patterns: scope-creep + inconsistency
- 115 turns / 14.9M tokens consumed — runaway reasoning loop ([aw] Daily Safe Output Tool Optimizer failed #35316)
- Token usage is wildly disproportionate to task scope
- Tracked — circuit breaker / hard turn cap needed
LintMonster (~30% success, 5 runs)
- Patterns: under-creation + inconsistency
- 2218+ finding backlog; resource/timeout failures ([lint-monster] [Lint] Break up long functions in pkg/workflow/ (2417 issues) #35368 epic, [aw] LintMonster failed #35370)
- Only ~5 issues surfaced out of 2218+ queued — severe throughput gap

Behavioral Pattern Analysis

Pattern Summary (from pattern-detector)

Pattern	Count	Agents
`under-creation`	6	Q, AI Moderator, Smoke CI, LintMonster, Copilot CLI workflows, PR Sous Chef
`inconsistency`	5	Content Moderation, Agentic Commands, CGO, LintMonster, Daily Safe Output Tool Optimizer
`over-creation`	1	failure-reporters
`repetition`	1	failure-reporters
`scope-creep`	1	Daily Safe Output Tool Optimizer

Key Observations

Dominant failure mode is under-creation — 6 of 14 analyzed agents produce zero or near-zero output. Root causes vary: platform blocker (Copilot CLI), safe_outputs P0 (PR Sous Chef), silent-skip (Q, AI Moderator), infrastructure timeout (LintMonster).
Two healthy top performers show no patterns — copilot-swe-agent and spec-enforcer remain clean across all dimensions.
Two sharp regressions detected — Agentic Commands and Content Moderation both dropped >50% week-over-week; their inconsistency classification flags them for immediate investigation.
failure-reporters duplication is a chronic unresolved issue — this is the 3rd+ consecutive weekly report flagging the 60% dupe rate; escalation may be needed.

Collaboration Analysis

Productive: copilot-swe-agent ↔ spec-enforcer — complementary outputs on code quality campaigns
Gap: Campaign Manager ↔ worker workflows — campaign-manager-latest.md is absent; coordination is one-directional
Conflict risk: failure-reporters flooding the issue tracker may mask real P0/P1 signals from other agents

Ecosystem Coverage & Health

Coverage Status

Compilation: 236/236 workflows have lock files (100% ✅)
Active (>0% success this week): ~8 workflow groups
Silent/blocked (0% success this week): ~6 workflow groups

Engine Distribution

Copilot: Majority of workflows — platform-level failure affecting multiple agents ([aw] Copilot CLI Deep Research Agent failed #35388)
Claude: spec-enforcer, Daily Safe Output Tool Optimizer — mixed performance
Codex: Included in Smoke CI cluster — 0% this week

Coverage Gaps

Security vulnerability tracking — no dedicated agent observed
Campaign orchestration — Campaign Manager absent from memory; coordination gap
CJS test health — no dedicated monitor catching the shard 4 failure before CI broke

Redundancy

Multiple failure-reporting workflows with overlapping scope — duplication risk is actualized (60% dupe rate)

Recommendations

High Priority 🔴

Investigate Agentic Commands + Content Moderation regressions (new this week)
- Both dropped >50% week-over-week with inconsistency pattern
- Likely correlated with CJS shard 4 CI blocker or Step Name Alignment failure (both P1, filed 2026-05-29)
- Action: Review last 3 failure logs for both; determine if Step Name Alignment fix resolves both regressions
Resolve safe_outputs P0 [aw-failures] Failure Investigator (6h) — 4 failures, safe_outputs add_comment validation breaks 3 workflows (2026-05-27 19:24 → 2026-05-28 01 [Content truncated due to length] #35351 (unblocks PR Sous Chef, Contribution Check, Sub-Issue Closer)
- Root cause: agent omits item_number when target: "*" is configured
- Fix: Add target resolution logic in affected workflow prompts
Add dedup gate to failure-reporters (3rd+ consecutive recommendation)
- 60% duplicate rate = ~12 duplicate issues/day polluting the tracker
- Add check for open issues with similar title before filing new one
- Estimated effort: 1-2 hours; expected improvement: ~80% dupe reduction

Medium Priority 🟡

Add hard turn/token cap to Daily Safe Output Tool Optimizer ([aw] Daily Safe Output Tool Optimizer failed #35316)
- 14.9M tokens / 115 turns for a routine task is unsustainable
- Add max_turns: 20 or equivalent circuit-breaker before next run
Shard LintMonster into bounded batches ([lint-monster] [Lint] Break up long functions in pkg/workflow/ (2417 issues) #35368 epic)
- 2218+ backlog with resource timeouts — process 50-100 findings per run max
- Prevents cascading timeouts and enables incremental progress
Audit Campaign Manager absence
- campaign-manager-latest.md not found — Campaign Manager may have stopped running
- Check last successful run; restore if needed for cross-orchestrator coordination

Low Priority 🟢

Review Q and AI Moderator silent-skip behavior
- Zero outputs, no failure logs — trigger conditions or permissions may have changed
- Silent-skip cluster audit was filed last run; verify follow-through

Trends vs. Last Week

Metric	Last Week	This Week	Change
Quality score	74/100	72/100	↓2
Effectiveness	72/100	68/100	↓4
Ecosystem health	82/100	76/100	↓6
Agents at 0% success	4	6	↑2
Active P0 issues	1	1	→
Active P1 issues	3	5	↑2
Compilation success	100%	100%	→

Trend direction: Declining — two new regressions (Agentic Commands, Content Moderation) and two new P1 issues (CJS CI, Step Name Alignment) push overall health down. Core infrastructure (compilation, copilot-swe-agent, spec-enforcer) remains stable.

Actions Taken This Run

Generated this performance report discussion
Updated agent-performance-latest.md shared memory
Updated shared-alerts.md with new regression findings
No new improvement issues filed (regressions likely root-caused by existing P1 issues already tracked by Workflow Health Manager)

Next Steps

Resolve Step Name Alignment P1 — likely fixes Agentic Commands + Content Moderation regressions
Fix safe_outputs P0 [aw-failures] Failure Investigator (6h) — 4 failures, safe_outputs add_comment validation breaks 3 workflows (2026-05-27 19:24 → 2026-05-28 01 [Content truncated due to length] #35351 — unblocks 3+ workflows
Implement failure-reporters dedup gate (chronic, 3+ weeks unresolved)
Restore Campaign Manager if absent
Add token cap to Daily Safe Output Tool Optimizer ([aw] Daily Safe Output Tool Optimizer failed #35316)

Analysis period: 2026-05-22 to 2026-05-29
Next report: 2026-06-05
Run: §26640736780
Previous run: §26579184217

Generated by ⚡ Agent Performance Analyzer - Meta-Orchestrator · sonnet46 2M · ◷

expires on May 30, 2026, 1:52 PM UTC

2026-05-30T14:59:46Z

github-actions[bot]
Bot May 30, 2026
Author

This discussion was automatically closed because it expired on 2026-05-30T13:52:08.438Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of 2026-05-29 #35716

Uh oh!

{{title}}

Uh oh!

Top Performing Agents 🏆

Agents With Sharp Regressions 📉

Persistently Failing Agents 🚫

Agents With Behavioral Problems ⚠️

Pattern Summary (from pattern-detector)

Key Observations

Collaboration Analysis

Coverage Status

Engine Distribution

Coverage Gaps

Redundancy

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of 2026-05-29 #35716

Uh oh!

github-actions[bot] Bot May 29, 2026

Executive Summary

Top Performing Agents 🏆

Agents With Sharp Regressions 📉

Persistently Failing Agents 🚫

Agents With Behavioral Problems ⚠️

Pattern Summary (from pattern-detector)

Key Observations

Collaboration Analysis

Coverage Status

Engine Distribution

Coverage Gaps

Redundancy

Recommendations

High Priority 🔴

Medium Priority 🟡

Low Priority 🟢

Trends vs. Last Week

Actions Taken This Run

Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 30, 2026 Author

github-actions[bot]
Bot May 29, 2026

github-actions[bot]
Bot May 30, 2026
Author