Agent Performance Report — Week of 2026-03-23 #22482

2026-03-23T17:48:47Z

github-actions[bot]
bot Mar 23, 2026

Analysis period: 2026-03-17 → 2026-03-23 | Workflow run: §23451246701

Executive Summary

Workflows monitored: 177 (23 distinct active in 7-day window)
Total runs analyzed: 85 (7-day), 30 (today)
Safe outputs this week: 5 (discussion × 1, comments × 2, issues/comments × 2)
Quality score (Q): 82/100 → 82/100 (stable)
Effectiveness score (E): 74/100 → 75/100 (↑1, marginal improvement)
Health score (H): 69 → 71 (↑2, from Workflow Health Manager)
Top performers: Issue Monster, Contribution Check, PR Triage Agent, Smoke Gemini
Ongoing P1: Smoke Update Cross-Repo PR (0% success, 8+ days, issue #22241)
In recovery (monitoring): Issue Triage Agent (partial), Daily Rendering Scripts Verifier (holding)

Performance Rankings

Top Performing Agents 🏆

Issue Monster — Quality: 92/100 | Effectiveness: 95/100
- 3 schedule runs this week, 100% success, 94 total turns
- High-volume agentic task execution (up to 54 turns in a single run)
- Consistent output cadence with no P1 issues
Contribution Check — Quality: 88/100 | Effectiveness: 88/100
- 89% success rate, improved from 67% (month prior)
- 14 turns per run indicating thorough analysis
- Trend: solidly improving and holding gains
PR Triage Agent — Quality: 85/100 | Effectiveness: 90/100
- 10/10 consecutive schedule successes — fully recovered ✅
- No safe output waste; clean signal-to-noise
Semantic Function Refactoring — Quality: 80/100 | Effectiveness: 80/100
- 44-turn deep run with $1.09 Claude cost — large-scope refactoring tasks
- 1,680,874 tokens consumed: highest token usage this week
- Successfully produced 2 safe outputs from a single run
Smoke Gemini — Quality: 82/100 | Effectiveness: 90/100
- 5/5 consecutive schedule successes — recovery solidified 🎉
- After extended outage, now stable and promoting from "recovering" to "healthy"
Grumpy Code Reviewer 🔥 — Quality: 78/100 | Effectiveness: 95/100
- 7 PR-triggered runs, 100% success
- Zero turn data suggests efficient, fast executions on PR events
Agent Performance Analyzer (self) — Quality: 84/100 | Effectiveness: 84/100
- 100% schedule success, 1 safe output (discussion) per run
- Consistent structured reporting, memory updates maintained

Agents Needing Attention 📉

Smoke Update Cross-Repo PR — Quality: N/A | Effectiveness: 10/100
- P1 ONGOING: 0% success rate, 10+ consecutive failures over 8+ days
- Issue #22241 is filed and updated
- Root cause under investigation: label resolution or PR state mismatch suspected
- No improvement from last week — escalate if no movement by 2026-03-30
Issue Triage Agent — Quality: 60/100 | Effectiveness: 50/100
- PARTIAL RECOVERY: 1 success on 2026-03-20 after 5+ consecutive failures
- No runs observed 2026-03-21 or 2026-03-22 (may be trigger-dependent)
- Status: monitoring; needs 3+ consecutive successes to confirm recovery

Inactive / Skipped Agents

Many agents triggered by PR/issue events show "skipped" conclusions this week — this is expected behavior (conditions not met). Agents with all-skipped patterns include: PR Nitpick Reviewer, Scout, Q, /cloclo, Archie, Security Review Agent, Documentation Unbloat, ACE Editor Session, Resource Summarizer Agent, Mergefest, Plan Command, CI Failure Doctor. These are event-driven and skipping is not a failure.

Quality Analysis

Output Quality Distribution

Score Range	Count	Agents
Excellent (80–100)	6	Issue Monster, Contribution Check, PR Triage, Smoke Gemini, Agent Performance Analyzer, Semantic Function Refactoring
Good (60–79)	3	Grumpy Code Reviewer, Issue Triage Agent, Daily Rendering Scripts Verifier
Fair (40–59)	0	—
Poor (<40)	1	Smoke Update Cross-Repo PR (P1)

Common Quality Patterns

Positive patterns observed:

Structured markdown outputs with clear sections (headers, bullet lists, code blocks)
Safe output count per run is appropriate (1–2 items per workflow run)
Memory file updates are consistent across meta-orchestrators
Lockfile Statistics Analysis Agent producing concise, actionable safe items

Areas for improvement:

19 stale lock files (P2): claude-code-user-docs-review, code-scanning-fixer, constraint-solving-potd, daily-copilot-token-report, daily-malicious-code-scan, daily-repo-chronicle, delight, developer-docs-consolidator, github-mcp-structural-analysis, glossary-maintainer, go-pattern-detector, pdf-summary, portfolio-analyst, security-review, smoke-agent-public-approved, smoke-codex, smoke-workflow-call, super-linter, workflow-normalizer
- This is a different set from the previous run (20 stale last week → 19 this week)
- Indicates ongoing code churn with agents editing .md files without running make recompile
- The rotating nature suggests this is systemic, not one-off

Effectiveness Analysis

Task Completion Rates (schedule-triggered agents)

Agent	Runs (7d)	Success Rate	Status
Issue Monster	3	100%	✅ Healthy
Grumpy Code Reviewer	7	100%	✅ Healthy
AI Moderator	4	100%	✅ Healthy
Contribution Check	1	100%	✅ Healthy
Smoke Codex	1	100%	✅ Healthy
Smoke Gemini	~5	100%	✅ Solidified recovery
PR Triage Agent	~10	100%	✅ Fully recovered
Smoke Update Cross-Repo PR	10+	0%	❌ P1
Issue Triage Agent	~1	~50%	⚠️ Partial recovery

Resource Efficiency (today's runs)

Agent	Tokens	Cost	Turns	Efficiency
Semantic Function Refactoring	1,680,874	$1.09	44	Claude deep work — expected
Issue Monster	846,286	—	20	High volume, appropriate
Agent Container Smoke Test	183,051	—	5	Efficient
The Great Escapi	77,607	—	2	Very efficient
Smoke Claude/Copilot	~0	—	5–7	Efficient smoke tests

Observation: Semantic Function Refactoring at $1.09/run (Claude engine) is the highest-cost agent. For context, this is doing substantive code refactoring — the cost appears proportionate to the work.

Behavioral Patterns

Productive Patterns ✅

Recovery consolidation: PR Triage (10/10 ✅), Smoke Gemini (5/5 solidified ✅), Daily Rendering Scripts Verifier (2/2 holding 🎉) — the recovery pipeline is working
Meta-orchestrator coordination: Workflow Health → Shared Alerts → Agent Performance → Campaign Manager chain is functioning, with shared memory being updated reliably each cycle
Issue Monster high turns: 54-turn run on 2026-03-22 suggests complex multi-step orchestration; all resulted in success — this is productive behavior, not wasteful
Stale lock file rotation: The rotating set of 19–20 stale lock files indicates agents are actively making code changes; good signal for development velocity

Problematic Patterns ⚠️

Smoke Update Cross-Repo PR silent failure (P1): Root cause not yet resolved after 8+ days. The workflow is completing (skipped conclusion) rather than failing, which may indicate a logic/condition issue rather than infrastructure failure — harder to debug from run logs alone.
Issue Triage Agent irregular runs: Inconsistent scheduling or trigger conditions prevent reliable recovery confirmation. Need to establish baseline run cadence.
Stale lock file churn: Different workflows appear in the stale list each cycle — agents are modifying .md workflow files but not recompiling. A pre-commit or CI enforcement of make recompile would help.

Coverage Analysis

Coverage Map

Well-covered areas:

Code review: Grumpy Code Reviewer, PR Nitpick Reviewer, Security Review Agent
Issue management: Issue Monster, Issue Triage Agent, Auto-Triage Issues
PR management: PR Triage Agent, Smoke suite (Copilot/Claude/Codex/Gemini)
Meta-orchestration: Workflow Health Manager, Campaign Manager, Agent Performance Analyzer
Refactoring: Semantic Function Refactoring, Code Refiner, The Great Escapi

Coverage gaps:

No dedicated dependency/vulnerability scanning agent
Performance regression monitoring (beyond smoke tests)
Documentation freshness (Documentation Unbloat is event-triggered, not proactive)

Potential redundancy:

Multiple smoke test variants (Copilot, Claude, Codex, Gemini, ARM64, Agent, Multi PR, Create/Update Cross-Repo PR) — appropriate for matrix testing, not redundant
Issue Monster + Issue Triage Agent cover overlapping territory — but serve different functions (creation vs. triage)

Recommendations

High Priority

Resolve Smoke Update Cross-Repo PR P1 (issue #22241)
- Current status: 0% success, 10+ consecutive failures, 8+ days
- Next action: Inspect label state and PR open/merge conditions manually
- Escalation trigger: No fix by 2026-03-30
Confirm Issue Triage Agent recovery
- Monitor for 3+ consecutive schedule successes before marking healthy
- If no new runs by 2026-03-25, investigate trigger conditions
- Estimated effort: 1 hour investigation

Medium Priority

Address stale lock file churn (P2)
- 19 stale lock files this run — rotating set across runs
- Recommendation: Add a CI step or workflow that runs make recompile on .md changes
- This would eliminate the recurring P2 signal and reduce agent confusion
- Estimated effort: 2–3 hours
Optimize Semantic Function Refactoring cost tracking
- $1.09/run at 44 turns is appropriate for deep refactoring, but should be tracked weekly
- Recommend adding to cost dashboard once metrics collection is operational
- Set alert if per-run cost exceeds $2.00

Low Priority

Add continuous metrics collection
- Current metrics/latest.json only has filesystem data (GitHub token unavailable during collection)
- Metrics Collector is running (8/8 ✅) but producing partial data
- Full metrics would enable better quantitative scoring

Trends

Metric	Previous	Current	Trend
Quality (Q)	82	82	→ Stable
Effectiveness (E)	74	75	↑ +1
Health (H)	69	71	↑ +2
P1 issues	1	1	→ Same (Smoke Update Cross-Repo PR)
Stale lock files	20	19	↓ -1
Recovering agents	4	2	↓ Improved (2 promoted to healthy)
Schedule success rate	~93%	~93%	→ Stable

Actions Taken This Run

Created this performance report discussion
Updated /tmp/gh-aw/repo-memory/default/agent-performance-latest.md
Updated /tmp/gh-aw/repo-memory/default/shared-alerts.md
No new improvement issues created (all P1 items already tracked)

Next Steps

Monitor Smoke Update Cross-Repo PR — escalate issue [health] Smoke Update Cross-Repo PR: 100% failure rate on schedule (6/6 consecutive failures) #22241 if no resolution by 2026-03-30
Confirm Issue Triage Agent recovery with 3 consecutive successes
Continue monitoring Daily Rendering Scripts Verifier (holding, needs 5+ to graduate)
Investigate stale lock file churn → propose CI enforcement of recompile

References: §23451246701 | §23408443798 | §23426422007

AI generated by Agent Performance Analyzer - Meta-Orchestrator · history

expires on Mar 24, 2026, 5:48 PM UTC

2026-03-24T18:58:54Z

github-actions[bot]
bot Mar 24, 2026
Author

This discussion was automatically closed because it expired on 2026-03-24T17:48:47.012Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of 2026-03-23 #22482

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of 2026-03-23 #22482

Uh oh!

github-actions[bot] bot Mar 23, 2026

Executive Summary

Performance Rankings

Top Performing Agents 🏆

Agents Needing Attention 📉

Inactive / Skipped Agents

Quality Analysis

Effectiveness Analysis

Behavioral Patterns

Productive Patterns ✅

Problematic Patterns ⚠️

Coverage Analysis

Recommendations

High Priority

Medium Priority

Low Priority

Trends

Actions Taken This Run

Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 24, 2026 Author

github-actions[bot]
bot Mar 23, 2026

github-actions[bot]
bot Mar 24, 2026
Author