Two approaches to AI agents in CI/CD — why both exist and where they break #23245

bindsi · 2026-03-27T13:47:49Z

bindsi
Mar 27, 2026

Context

We recently filed a bug about agentic workflows failing with token limit overflow in repositories with comprehensive custom instructions. While investigating alternatives, we found that the Agent Governance Toolkit team built a reusable ai-agent-runner composite action that takes a fundamentally different approach to running AI agents in GitHub Actions — and does not hit the same token limits.

This raises an interesting question: what are the trade-offs between these two approaches, and why do they coexist?

The two approaches

Approach A: Copilot Agentic Workflows (`gh aw`)

Copilot agentic workflows use the Copilot CLI as a full agent runtime. The agent operates in an agentic loop with tool access — it can read files, search code, run commands, edit files, and commit changes. The workflow author writes a markdown prompt with YAML frontmatter, and the agent executes autonomously.

How instructions are loaded: The Copilot CLI auto-discovers and injects all files matching these patterns from the workspace:

.github/copilot-instructions.md
.github/instructions/**/*.instructions.md
.github/agents/**/*.agent.md
.github/skills/**/SKILL.md

There is no filtering mechanism. Every matching file is loaded into the prompt before the workflow even begins.

The problem: In our repo (microsoft/hve-core), the .github/ directory contains ~2.6 MB / ~440K tokens of instructions, agents, and skills. The CLI's 168K token limit is exceeded before the workflow prompt gets processed. The workflow prompt itself is ~5K tokens.

Approach B: Agent Governance Toolkit (`ai-agent-runner`)

The ai-agent-runner is a composite GitHub Action that calls LLM APIs directly (GitHub Models / Azure inference). It fetches PR/issue context via the GitHub API, builds a prompt from explicit inputs, makes a single LLM call, and posts results back as comments or reviews.

How instructions are loaded: Only what the workflow author passes via the custom-instructions input. Zero auto-discovery. The system prompt is built from:

A fixed preamble identifying the agent type and repository
The custom-instructions string (passed explicitly)
Fetched context (PR diff, issue body, etc.)

Token management: Diffs are truncated to ~24K characters (~6K tokens). The workflow author controls max-tokens for the response. There is no path where auto-discovered content can overflow the context window.

Comparison

Dimension	`gh aw` (Copilot Agentic Workflows)	`ai-agent-runner` (Agent Governance Toolkit)
Runtime	Copilot CLI — full agentic loop with tool use	Single-turn LLM API call — no tool use
Capabilities	Reads, writes, edits, searches, runs commands, commits	Reads context, generates text, posts comments
Instruction loading	Auto-discovers all `.github/` artifacts	Explicit `custom-instructions` input only
Token budget control	None — all-or-nothing auto-discovery	Workflow author controls every token
Context fetching	Agent decides what to read on the fly	Action fetches PR diff/issue body pre-call
Output	Code changes, commits, PRs	Comments, reviews, artifacts
Model fallback	Not exposed	Built-in primary/fallback model support
Scales with repo maturity	Degrades — more instructions = more likely to overflow	Unaffected — instructions are always explicit

Why both exist

These tools solve different problems on the same surface (GitHub Actions):

gh aw is an autonomous developer. It operates in an agentic loop, can explore the codebase, make multi-file edits, run tests, and commit. It is the right tool when you need the agent to do work — implement a feature, fix a bug, refactor code. The auto-discovery of instructions makes sense conceptually: the agent should follow the repo's coding standards.

ai-agent-runner is a smart reviewer/commenter. It makes a single LLM call with curated context and posts the result. It is the right tool when you need an agent to analyze and respond — review a PR, triage an issue, scan for security patterns. It does not need the full instruction set because it is not writing code.

The tensions

1. Auto-discovery is the right idea with the wrong implementation

Auto-discovery of instructions is valuable — workflow authors should not have to manually replicate what VS Code already does in the IDE. The problem is that the current implementation loads everything unconditionally, without respect for applyTo patterns, relevance, or token budgets. The ai-agent-runner avoids this by not doing auto-discovery at all, but that is a workaround, not a solution.

2. "What files the agent can access" vs. "What instructions the agent should follow"

gh aw conflates these two concerns through checkout: sparse-checkout. The only way to prevent instruction files from being auto-discovered is to prevent them from existing on disk — which also prevents the agent from reading them if needed during its work. The ai-agent-runner has no such conflation because it never reads from disk.

3. Single-turn vs. agentic loop

The ai-agent-runner works within token limits precisely because it is single-turn — there is no tool-use loop, no intermediate context, no accumulated state. But this also limits what it can do. You cannot ask it to implement a feature. The question is whether agentic workflows can adopt some of the ai-agent-runner's explicit instruction management without sacrificing agentic capabilities.

4. Scaling with repository maturity

The ai-agent-runner scales indefinitely because it only loads what you give it. Agentic workflows scale inversely with repo maturity — the better-organized your repo's instructions, the more likely you are to hit the token ceiling. Enterprise repositories with multiple teams contributing standards will hit this wall early.

Questions for discussion

Should gh aw adopt explicit instruction filtering (like ai-agent-runner's custom-instructions input) alongside auto-discovery?
Could gh aw respect applyTo patterns from instruction frontmatter to load only relevant instructions per workflow?
Is there value in a hybrid: auto-discover instructions but with a token budget, truncating or prioritizing by relevance?
Should the ai-agent-runner approach be considered a first-class pattern for "lightweight agent" use cases in agentic workflows?
For teams using both tools today, what patterns have you found for deciding when to use which?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two approaches to AI agents in CI/CD — why both exist and where they break #23245

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Two approaches to AI agents in CI/CD — why both exist and where they break #23245

Uh oh!

bindsi Mar 27, 2026

Context

The two approaches

Approach A: Copilot Agentic Workflows (gh aw)

Approach B: Agent Governance Toolkit (ai-agent-runner)

Comparison

Why both exist

The tensions

1. Auto-discovery is the right idea with the wrong implementation

2. "What files the agent can access" vs. "What instructions the agent should follow"

3. Single-turn vs. agentic loop

4. Scaling with repository maturity

Questions for discussion

Related

Replies: 0 comments

bindsi
Mar 27, 2026

Approach A: Copilot Agentic Workflows (`gh aw`)

Approach B: Agent Governance Toolkit (`ai-agent-runner`)