[Pelis Agent Factory Advisor] Agentic Workflow Maturity Analysis & Recommendations (April 2026) #1743

2026-04-07T10:57:46Z

github-actions[bot]
bot Apr 7, 2026

📊 Executive Summary

This repository is an exceptionally mature agentic workflow operator — with 27 agentic .md workflow definitions and 18 traditional GitHub Actions workflows, it is literally the dogfood platform for AWF itself. The automation coverage is comprehensive across security, testing, documentation, cost management, and issue lifecycle. The primary opportunities lie in filling a few specific gaps (Codex cost visibility, container image scanning) and adding intelligence layers on top of existing automation (performance regression detection, PR code quality review).

🎓 Patterns Learned & Applied

The following patterns from the Pelis Agent Factory were observed and applied to this analysis:

Pattern	Description	Used In This Repo?
Analyzer → Optimizer chaining	`workflow_run` trigger links analyzer to optimizer	✅ Claude & Copilot token chains
Triple-engine coverage	Same task run with Claude, Copilot, Codex for comparison	✅ Secret Diggers
skip-if-match dedup guard	Prevents duplicate runs/issues via query check	✅ Multiple workflows
Shared imports	`imports:` for reusable fragments (`mcp-pagination.md`, etc.)	✅ Widely used
Cache-memory persistence	Cross-run state storage	✅ `issue-duplication-detector`, `security-review`
Cross-repo dispatch	Issue triage across org repos	✅ `firewall-issue-dispatcher`
Issue Monster assignment	Auto-assign issues to Copilot agents	✅ `issue-monster`
CI Doctor	Automated failure investigation with issue creation	✅ `ci-doctor`
Slash commands	`/plan` command on issues/discussions	✅ `plan.md`
Expires-based maintenance	Auto-close old agent-created entities	✅ `agentics-maintenance.yml`

📋 Workflow Inventory

Agentic Workflows (27 total)

Workflow	Purpose	Trigger	Assessment
`build-test`	Full multi-language build validation	PR	✅ Well-configured, broad runtime matrix
`ci-cd-gaps-assessment`	Identify CI/CD coverage gaps	Daily	✅ Good self-improvement loop
`ci-doctor`	Investigate CI failures automatically	`workflow_run` failure	✅ Excellent coverage of monitored workflows
`claude-token-usage-analyzer`	Daily Claude cost analysis	Daily	✅ Detailed, with `shared/reporting.md`
`claude-token-optimizer`	Optimization recommendations	After analyzer	✅ Clean chaining via `workflow_run`
`cli-flag-consistency-checker`	Catch doc/code drift for CLI flags	Weekly	✅ Domain-specific, actionable
`copilot-token-usage-analyzer`	Daily Copilot cost analysis	Daily	✅ Matches Claude pattern
`copilot-token-optimizer`	Optimization recommendations	After analyzer	✅ Clean chain pattern
`dependency-security-monitor`	CVE detection + patch PRs	Daily	✅ Comprehensive with `expires`
`doc-maintainer`	Sync docs with code changes	Daily	✅ 7-day lookback, PR output
`firewall-issue-dispatcher`	Cross-repo `awf` issue tracking	Every 6h	✅ Cross-repo PAT, dedup-aware
`issue-duplication-detector`	Prevent duplicate issue filing	`issues.opened`	✅ Cache-memory fingerprinting
`issue-monster`	Assign issues to Copilot	Hourly + `issues.opened`	✅ Queue management, skip-if-draft
`pelis-agent-factory-advisor`	This workflow — meta-analysis	Weekly	✅ Self-reflecting
`plan`	Slash-command planning assistant	`/plan`	✅ Discussions + sub-issues
`secret-digger-claude`	Red team: find leaked secrets	Hourly	✅ Stress-tests AWF isolation
`secret-digger-codex`	Red team: find leaked secrets	Hourly at :10	✅ Multi-engine coverage
`secret-digger-copilot`	Red team: find leaked secrets	Hourly	✅ Staggered cron timing
`security-guard`	PR security boundary review	PR	✅ Claude engine, 15-turn budget
`security-review`	Daily threat modeling	Daily	✅ Broad permissions, cache-memory
`smoke-chroot`	Validate chroot isolation	PR + 12h	✅ Core security validation
`smoke-claude`	End-to-end Claude engine smoke	PR + schedule	✅ Engine-specific validation
`smoke-codex`	End-to-end Codex engine smoke	PR + schedule	✅ Engine-specific validation
`smoke-copilot`	End-to-end Copilot engine smoke	PR + schedule	✅ Engine-specific validation
`smoke-services`	Host service port connectivity	PR + 12h	✅ Tests `--allow-host-service-ports`
`test-coverage-improver`	Weekly test PR for security paths	Weekly	✅ Focused on critical code paths
`update-release-notes`	Enrich release notes post-publish	`release.published`	✅ Git diff-based analysis

Standard (Non-Agentic) Workflows (18 total)

build.yml, codeql.yml, dependency-audit.yml, deploy-docs.yml, docs-preview.yml, link-check.yml, lint.yml, performance-monitor.yml, pr-title.yml, release.yml, test-action.yml, test-chroot.yml, test-coverage.yml, test-examples.yml, test-integration-suite.yml, test-integration.yml, agentics-maintenance.yml, copilot-setup-steps.yml

🚀 Recommendations

P0 — High Impact, Low Effort (Quick Wins)

1. Codex Token Usage Analyzer + Optimizer

What: Add codex-token-usage-analyzer.md and codex-token-optimizer.md mirroring the Claude/Copilot pair.

Why: Secret diggers run 3 Codex agents hourly — that's ~72 Codex runs/day. Without cost visibility, Codex spend is a blind spot while Claude and Copilot are fully instrumented.

How: Copy copilot-token-usage-analyzer.md, change engine filter to codex, adjust labels. Chain with an optimizer via workflow_run. The shared/reporting.md import is already reusable.

Effort: Low — ~30 min, straightforward template adaptation.

Example frontmatter:

description: Daily Codex token usage analysis across agentic workflow runs
on:
  schedule: daily
  workflow_dispatch:
engine: codex  # filter in prompt
imports:
  - shared/mcp-pagination.md
  - shared/reporting.md
safe-outputs:
  create-issue:
    title-prefix: "📊 Codex Token Usage Report"
    labels: [codex-token-usage-report]
    close-older-issues: true

2. Performance Regression Detector (Agentic Layer)

What: Add performance-regression-detector.md that triggers after performance-monitor.yml completes, reads benchmark results, and creates issues when regressions exceed threshold.

Why: performance-monitor.yml runs benchmarks weekly but produces raw JSON — no intelligence layer detects regressions or alerts maintainers. The CI Doctor pattern (trigger on workflow_run) is already proven here.

How: Trigger on workflow_run: [Performance Monitor], download the benchmark-results artifact, compare to cached baseline, file issues on ≥10% regression.

Effort: Low-Medium — the workflow_run + artifact read pattern is used in token analyzers.

on:
  workflow_run:
    workflows: ["Performance Monitor"]
    types: [completed]
    branches: [main]
tools:
  github:
    toolsets: [default, actions]
  cache-memory: true  # store baseline
safe-outputs:
  create-issue:
    title-prefix: "[Perf Regression] "
    labels: [performance, regression]

P1 — High Impact, Medium Effort

3. Container Image Security Scanner

What: A workflow using Trivy or Grype to scan the three AWF Docker images (squid, agent, api-proxy) published to GHCR for OS-level CVEs in container layers.

Why: CodeQL covers TypeScript/JS source, and dependency-security-monitor covers npm packages — but container image layers (Ubuntu 22.04 base, Squid packages, Node runtime) are not scanned. For a security product that ships container images, this is a meaningful gap. Container CVEs in base images won't appear in npm audit or CodeQL.

How: Use workflow_run: [Release] or a weekly schedule. Pull images from GHCR, run Trivy in SARIF mode, upload to GitHub Security tab. Alternatively create issues for CRITICAL/HIGH findings.

Effort: Medium — requires authenticated GHCR pull, Trivy setup, SARIF upload or issue creation.

on:
  workflow_run:
    workflows: ["Release"]
    types: [completed]
  schedule: weekly
safe-outputs:
  create-issue:
    title-prefix: "[Container CVE] "
    labels: [security, container]
    expires: 30d

4. PR Code Quality Review Agent

What: A general-purpose code quality review agent that runs on PRs alongside security-guard, focusing on correctness, maintainability, and TypeScript patterns — not just security.

Why: security-guard (Claude) reviews security boundaries exclusively. build-test (Copilot) validates that tests pass. Neither reviews code quality: complexity, test coverage of new paths, TypeScript antipatterns, or architectural consistency. The reviewer gap is especially notable given this repo's critical security posture.

How: PR trigger with pull_request: [opened, synchronize], Claude engine for nuanced reasoning, limited to add-comment with 1 max to avoid noise. Use skip-if-match to avoid running on trivial/docs-only PRs.

Effort: Medium — prompt engineering to scope review to non-security quality concerns without overlap with security-guard.

engine:
  id: claude
  max-turns: 10
on:
  pull_request:
    types: [opened, synchronize, reopened]
safe-outputs:
  add-comment:
    max: 1

5. Stale Issue Manager

What: A weekly agentic workflow that identifies issues with no activity for 30+ days, posts contextual follow-up questions (not just "is this still relevant?"), and applies stale labels.

Why: agentics-maintenance.yml handles expires-tagged agent-created entities, but human-filed issues with no expires field can accumulate indefinitely. The issue-monster assigns issues, but if agents can't reproduce or clarify, issues stall silently.

How: Weekly schedule, github.toolsets: [issues], fetch issues with no activity > 30d, generate context-aware follow-up questions based on issue body, post comment, apply stale label. Use skip-if-match to avoid running when too many stale issues already have pending comments.

Effort: Medium — requires careful prompt to generate useful (not generic) follow-up questions.

P2 — Medium Impact

6. Firewall Domain Whitelist Auditor

What: Monthly agent that audits domain whitelists in smoke test configurations and the --allow-domains examples in docs/README, verifying domains are reachable, still needed, and not overly permissive.

Why: As this codebase evolves, domain allowlists in smoke test .md files may include domains that are no longer needed, have moved, or have become overly broad (e.g., wildcard domains). A security-focused repo should continuously validate its own examples.

Effort: Low-Medium — bash DNS checks + GitHub content reads.

7. Breaking Change Detector

What: A PR-triggered agent that detects potentially breaking changes to the public CLI interface (src/cli.ts flag additions/removals/renames) and the Docker Compose API generated by src/docker-manager.ts, and adds a warning comment.

Why: AWF is consumed by other tools (gh-aw extension, CI pipelines). Unintentional breaking changes to CLI flags or Docker Compose structure could silently break consumers. security-guard doesn't cover this angle.

Effort: Medium — requires understanding of semver impact from diff analysis.

8. Issue Triage Enhancer

What: Complement issue-monster with a pre-assignment triage step that labels issues by category (bug/feature/docs/security), estimates complexity, and asks clarifying questions before Copilot picks them up.

Why: issue-monster assigns issues directly. Better triage before assignment means Copilot agents get better-scoped work items, reducing wasted agent turns.

Effort: Medium — two-phase pipeline, needs coordination with issue-monster via labels.

P3 — Nice to Have

9. AWF API Contract Drift Detector

What: Weekly check that src/types.ts interfaces haven't changed in ways that break the published API contract documented in docs, creating issues when drift is detected.

10. Contributor Onboarding Assistant

What: Triggered by pull_request from first-time contributors, explains relevant code patterns and points to CONTRIBUTING.md sections most relevant to their changes.

📈 Maturity Assessment

Dimension	Current (1–5)	Target	Gap
Security automation	5	5	Add container image scanning
Test coverage	4	5	Test coverage improver exists; container CVEs uncovered
Cost management	4	5	Codex costs not tracked
Documentation	5	5	doc-maintainer + CLI consistency checker
Issue lifecycle	5	5	issue-monster + dedup + dispatcher
Release automation	4	5	Release exists; no post-release container scan
Observability	4	5	Performance benchmarks lack regression intelligence
PR quality	3	4	Security-guard excellent; general code review absent

Overall maturity: 4.5/5 — One of the most comprehensively automated repositories in the AWF ecosystem. The gap is narrow but targeted at a security product's most critical blind spots.

🔄 Best Practice Comparison

What This Repo Does Exceptionally Well

Self-dogfooding: Running AWF to test AWF is the best possible integration test
Triple-engine red team: Running secret-diggers on Claude, Copilot, and Codex simultaneously with staggered cron slots (:00, :05, :10) is a sophisticated comparative testing pattern
Cost visibility: The analyzer → optimizer chain for two engines demonstrates operational maturity
Defensive skip-if-match: Nearly every recurring workflow has dedup guards preventing runaway costs
Shared imports: shared/ directory with mcp-pagination.md, reporting.md, secret-audit.md and version-reporting.md enables DRY workflow authoring
CI Doctor: Automated failure investigation with issue creation reduces toil significantly
Cross-repo dispatch: firewall-issue-dispatcher integrating gh-aw ↔ gh-aw-firewall is an advanced pattern

What to Improve

Codex cost blind spot: The only major gap in the excellent token management system
Container layer security: Trivy/Grype container scanning is the one security category not yet covered
Performance regression intelligence: Raw benchmark JSON exists but no automated analysis layer

📝 Notes & Tracking

Cache updated: /tmp/gh-aw/cache-memory/repo-analysis-2026-04-07.json with workflow inventory and gap analysis.

Items to track on next run:

Was codex-token-usage-analyzer added? (P0)
Was container image scanning implemented? (P1)
Has performance-monitor.yml been upgraded with a regression detector? (P0)
Has the ci-doctor monitored workflow list been updated to include any new workflows added since last analysis?

Run ID: 24077494180 | Date: 2026-04-07

Generated by Pelis Agent Factory Advisor · ● 629.9K · ◷

expires on Apr 14, 2026, 10:57 AM UTC

2026-04-14T12:57:00Z

github-actions[bot]
bot Apr 14, 2026
Author

This discussion was automatically closed because it expired on 2026-04-14T10:57:45.865Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pelis Agent Factory Advisor] Agentic Workflow Maturity Analysis & Recommendations (April 2026) #1743

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Pelis Agent Factory Advisor] Agentic Workflow Maturity Analysis & Recommendations (April 2026) #1743

Uh oh!

github-actions[bot] bot Apr 7, 2026

📊 Executive Summary

🎓 Patterns Learned & Applied

📋 Workflow Inventory

Agentic Workflows (27 total)

Standard (Non-Agentic) Workflows (18 total)

🚀 Recommendations

P0 — High Impact, Low Effort (Quick Wins)

1. Codex Token Usage Analyzer + Optimizer

2. Performance Regression Detector (Agentic Layer)

P1 — High Impact, Medium Effort

3. Container Image Security Scanner

4. PR Code Quality Review Agent

5. Stale Issue Manager

P2 — Medium Impact

6. Firewall Domain Whitelist Auditor

7. Breaking Change Detector

8. Issue Triage Enhancer

P3 — Nice to Have

9. AWF API Contract Drift Detector

10. Contributor Onboarding Assistant

📈 Maturity Assessment

🔄 Best Practice Comparison

What This Repo Does Exceptionally Well

What to Improve

📝 Notes & Tracking

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 14, 2026 Author

github-actions[bot]
bot Apr 7, 2026

github-actions[bot]
bot Apr 14, 2026
Author