You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consistently high quality: All 6 scenarios scored ≥ 4.6/5.0 — the agent produces production-ready workflow configurations across diverse automation types
Trigger selection is a strength: Every scenario received 5/5 for trigger appropriateness, including the nuanced choice of workflow_run over pull_request for post-CI artifact analysis
Security practices are deeply embedded: strict mode, network firewall (AWF), minimal permissions, scoped bash tool lists, and the noop guard appear in every response
Prompt structure is consistent: All responses follow a Context → Numbered Steps → Output template → Error handling → Noop guard pattern
Minor gaps in tool scoping: Two scenarios had slightly broad bash glob patterns or top-level permission grants that could be tightened
Top Patterns
Trigger by workflow type: PR automation → pull_request + paths: + concurrency cancel-in-progress; Post-CI artifact work → workflow_run; Scheduled reports → schedule cron + workflow_dispatch for testing
Tools consistently scoped: GitHub toolsets named explicitly ([pull_requests, repos], [actions, issues], [default, discussions]); bash limited to named command families (go:*, git:*, curl, jq) never "*"
Safe-output hygiene: max: 1 + hide-older-comments: true / close-older-discussions: true applied universally to prevent accumulation; noop guard called out as required in every response
View High-Quality Responses (Top 3)
be-schema — DB Migration Reviewer (5.0/5.0)
Perfect across all dimensions. Standouts: (1) concurrency group keyed to PR number cancels stale runs on rapid pushes; (2) 7-category safety check taxonomy (transactions, destructive ops, rollback, FK indexes, NOT NULL without default, lock-hazardous ops, irreversible transforms); (3) add-labels: allowed: [needs-migration-review] as a strict guardrail — agent cannot apply arbitrary labels.
do-incident — Deployment Incident Reporter (5.0/5.0)
Correctly handles the hard cross-repo write case: scoped PAT (OPS_ISSUES_WRITE_TOKEN) with fine-grained issues:write only on the ops repo; close-older-key tied to run ID prevents duplicate incidents on retries; flake vs regression detection via querying recent run history; label creation shell scripts provided in setup checklist.
pm-digest — Weekly Product Digest (5.0/5.0)
Clean scheduled workflow design: close-older-discussions: true keeps the "Product Updates" category tidy automatically; three-phase gather/classify/draft structure prevents the common partial-data drafting failure; classification priority ordering (breaking → high → medium → low → internal) is explicit and deterministic.
View Areas for Improvement
fe-bundle — Bundle Size Analyzer (4.6/5.0) bash: ["gh api:*", "gh run:*", ...] is a somewhat broad glob. More precise alternatives: bash: ["gh:*"] or individual commands like bash: ["unzip", "jq", "node", "mkdir", "cat"] plus a scoped github toolset for API access. The two-workflow architecture justification is excellent, but the bash allowlist reduces security score.
qa-coverage — PR Coverage Guard (4.6/5.0) permissions: pull-requests: write appears in the top-level job permissions rather than being exclusively managed by the safe-outputs subsystem. In this case it may be technically required (git operations during worktree setup could need it), but the response doesn't clarify why — leaving room for ambiguity about whether safe-outputs alone would suffice.
Recommendations
Document workflow_run as the canonical pattern for post-CI analysis in .github/aw/create-agentic-workflow.md. The agent correctly chose it over pull_request for the bundle-size scenario (artifact availability, fork safety), but this decision tree isn't visible to users without asking. A short "When to use workflow_run vs pull_request" section would surface this proactively.
Add a cross-repo write pattern guide to .github/aw/github-agentic-workflows.md. The incident reporter correctly used a fine-grained PAT + target-repo in create-issue, but this is a non-obvious pattern. A canonical example of scoped-PAT cross-repo issue/PR creation would reduce friction for DevOps scenarios.
Tighten bash tool scope guidance: The agent sometimes uses command-family globs (gh api:*) when a more restrictive allowlist (individual commands + scoped GitHub toolset) would be safer. A note in the documentation recommending specific commands over family globs would improve the security baseline of generated workflows.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Persona Overview
developer.instructions(agentic-workflows)Key Findings
workflow_runoverpull_requestfor post-CI artifact analysisTop Patterns
pull_request+paths:+concurrencycancel-in-progress; Post-CI artifact work →workflow_run; Scheduled reports →schedulecron +workflow_dispatchfor testing[pull_requests, repos],[actions, issues],[default, discussions]); bash limited to named command families (go:*,git:*,curl,jq) never"*"max: 1+hide-older-comments: true/close-older-discussions: trueapplied universally to prevent accumulation; noop guard called out as required in every responseView High-Quality Responses (Top 3)
be-schema — DB Migration Reviewer (5.0/5.0)
Perfect across all dimensions. Standouts: (1) concurrency group keyed to PR number cancels stale runs on rapid pushes; (2) 7-category safety check taxonomy (transactions, destructive ops, rollback, FK indexes, NOT NULL without default, lock-hazardous ops, irreversible transforms); (3)
add-labels: allowed: [needs-migration-review]as a strict guardrail — agent cannot apply arbitrary labels.do-incident — Deployment Incident Reporter (5.0/5.0)
Correctly handles the hard cross-repo write case: scoped PAT (
OPS_ISSUES_WRITE_TOKEN) with fine-grainedissues:writeonly on the ops repo;close-older-keytied to run ID prevents duplicate incidents on retries; flake vs regression detection via querying recent run history; label creation shell scripts provided in setup checklist.pm-digest — Weekly Product Digest (5.0/5.0)
Clean scheduled workflow design:
close-older-discussions: truekeeps the "Product Updates" category tidy automatically; three-phase gather/classify/draft structure prevents the common partial-data drafting failure; classification priority ordering (breaking → high → medium → low → internal) is explicit and deterministic.View Areas for Improvement
fe-bundle — Bundle Size Analyzer (4.6/5.0)
bash: ["gh api:*", "gh run:*", ...]is a somewhat broad glob. More precise alternatives:bash: ["gh:*"]or individual commands likebash: ["unzip", "jq", "node", "mkdir", "cat"]plus a scopedgithubtoolset for API access. The two-workflow architecture justification is excellent, but the bash allowlist reduces security score.qa-coverage — PR Coverage Guard (4.6/5.0)
permissions: pull-requests: writeappears in the top-level job permissions rather than being exclusively managed by the safe-outputs subsystem. In this case it may be technically required (git operations during worktree setup could need it), but the response doesn't clarify why — leaving room for ambiguity about whether safe-outputs alone would suffice.Recommendations
Document
workflow_runas the canonical pattern for post-CI analysis in.github/aw/create-agentic-workflow.md. The agent correctly chose it overpull_requestfor the bundle-size scenario (artifact availability, fork safety), but this decision tree isn't visible to users without asking. A short "When to useworkflow_runvspull_request" section would surface this proactively.Add a cross-repo write pattern guide to
.github/aw/github-agentic-workflows.md. The incident reporter correctly used a fine-grained PAT +target-repoincreate-issue, but this is a non-obvious pattern. A canonical example of scoped-PAT cross-repo issue/PR creation would reduce friction for DevOps scenarios.Tighten bash tool scope guidance: The agent sometimes uses command-family globs (
gh api:*) when a more restrictive allowlist (individual commands + scoped GitHub toolset) would be safer. A note in the documentation recommending specific commands over family globs would improve the security baseline of generated workflows.View Full Scenario Scores
References:
Beta Was this translation helpful? Give feedback.
All reactions