You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently filed a bug about agentic workflows failing with token limit overflow in repositories with comprehensive custom instructions. While investigating alternatives, we found that the Agent Governance Toolkit team built a reusable ai-agent-runner composite action that takes a fundamentally different approach to running AI agents in GitHub Actions — and does not hit the same token limits.
This raises an interesting question: what are the trade-offs between these two approaches, and why do they coexist?
The two approaches
Approach A: Copilot Agentic Workflows (gh aw)
Copilot agentic workflows use the Copilot CLI as a full agent runtime. The agent operates in an agentic loop with tool access — it can read files, search code, run commands, edit files, and commit changes. The workflow author writes a markdown prompt with YAML frontmatter, and the agent executes autonomously.
How instructions are loaded: The Copilot CLI auto-discovers and injects all files matching these patterns from the workspace:
.github/copilot-instructions.md
.github/instructions/**/*.instructions.md
.github/agents/**/*.agent.md
.github/skills/**/SKILL.md
There is no filtering mechanism. Every matching file is loaded into the prompt before the workflow even begins.
The problem: In our repo (microsoft/hve-core), the .github/ directory contains ~2.6 MB / ~440K tokens of instructions, agents, and skills. The CLI's 168K token limit is exceeded before the workflow prompt gets processed. The workflow prompt itself is ~5K tokens.
The ai-agent-runner is a composite GitHub Action that calls LLM APIs directly (GitHub Models / Azure inference). It fetches PR/issue context via the GitHub API, builds a prompt from explicit inputs, makes a single LLM call, and posts results back as comments or reviews.
How instructions are loaded: Only what the workflow author passes via the custom-instructions input. Zero auto-discovery. The system prompt is built from:
A fixed preamble identifying the agent type and repository
The custom-instructions string (passed explicitly)
Fetched context (PR diff, issue body, etc.)
Token management: Diffs are truncated to ~24K characters (~6K tokens). The workflow author controls max-tokens for the response. There is no path where auto-discovered content can overflow the context window.
Degrades — more instructions = more likely to overflow
Unaffected — instructions are always explicit
Why both exist
These tools solve different problems on the same surface (GitHub Actions):
gh aw is an autonomous developer. It operates in an agentic loop, can explore the codebase, make multi-file edits, run tests, and commit. It is the right tool when you need the agent to do work — implement a feature, fix a bug, refactor code. The auto-discovery of instructions makes sense conceptually: the agent should follow the repo's coding standards.
ai-agent-runner is a smart reviewer/commenter. It makes a single LLM call with curated context and posts the result. It is the right tool when you need an agent to analyze and respond — review a PR, triage an issue, scan for security patterns. It does not need the full instruction set because it is not writing code.
The tensions
1. Auto-discovery is the right idea with the wrong implementation
Auto-discovery of instructions is valuable — workflow authors should not have to manually replicate what VS Code already does in the IDE. The problem is that the current implementation loads everything unconditionally, without respect for applyTo patterns, relevance, or token budgets. The ai-agent-runner avoids this by not doing auto-discovery at all, but that is a workaround, not a solution.
2. "What files the agent can access" vs. "What instructions the agent should follow"
gh aw conflates these two concerns through checkout: sparse-checkout. The only way to prevent instruction files from being auto-discovered is to prevent them from existing on disk — which also prevents the agent from reading them if needed during its work. The ai-agent-runner has no such conflation because it never reads from disk.
3. Single-turn vs. agentic loop
The ai-agent-runner works within token limits precisely because it is single-turn — there is no tool-use loop, no intermediate context, no accumulated state. But this also limits what it can do. You cannot ask it to implement a feature. The question is whether agentic workflows can adopt some of the ai-agent-runner's explicit instruction management without sacrificing agentic capabilities.
4. Scaling with repository maturity
The ai-agent-runner scales indefinitely because it only loads what you give it. Agentic workflows scale inversely with repo maturity — the better-organized your repo's instructions, the more likely you are to hit the token ceiling. Enterprise repositories with multiple teams contributing standards will hit this wall early.
Questions for discussion
Should gh aw adopt explicit instruction filtering (like ai-agent-runner's custom-instructions input) alongside auto-discovery?
Could gh aw respect applyTo patterns from instruction frontmatter to load only relevant instructions per workflow?
Is there value in a hybrid: auto-discover instructions but with a token budget, truncating or prioritizing by relevance?
Should the ai-agent-runner approach be considered a first-class pattern for "lightweight agent" use cases in agentic workflows?
For teams using both tools today, what patterns have you found for deciding when to use which?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Context
We recently filed a bug about agentic workflows failing with token limit overflow in repositories with comprehensive custom instructions. While investigating alternatives, we found that the Agent Governance Toolkit team built a reusable
ai-agent-runnercomposite action that takes a fundamentally different approach to running AI agents in GitHub Actions — and does not hit the same token limits.This raises an interesting question: what are the trade-offs between these two approaches, and why do they coexist?
The two approaches
Approach A: Copilot Agentic Workflows (
gh aw)Copilot agentic workflows use the Copilot CLI as a full agent runtime. The agent operates in an agentic loop with tool access — it can read files, search code, run commands, edit files, and commit changes. The workflow author writes a markdown prompt with YAML frontmatter, and the agent executes autonomously.
How instructions are loaded: The Copilot CLI auto-discovers and injects all files matching these patterns from the workspace:
.github/copilot-instructions.md.github/instructions/**/*.instructions.md.github/agents/**/*.agent.md.github/skills/**/SKILL.mdThere is no filtering mechanism. Every matching file is loaded into the prompt before the workflow even begins.
The problem: In our repo (
microsoft/hve-core), the.github/directory contains ~2.6 MB / ~440K tokens of instructions, agents, and skills. The CLI's 168K token limit is exceeded before the workflow prompt gets processed. The workflow prompt itself is ~5K tokens.Approach B: Agent Governance Toolkit (
ai-agent-runner)The
ai-agent-runneris a composite GitHub Action that calls LLM APIs directly (GitHub Models / Azure inference). It fetches PR/issue context via the GitHub API, builds a prompt from explicit inputs, makes a single LLM call, and posts results back as comments or reviews.How instructions are loaded: Only what the workflow author passes via the
custom-instructionsinput. Zero auto-discovery. The system prompt is built from:custom-instructionsstring (passed explicitly)Token management: Diffs are truncated to ~24K characters (~6K tokens). The workflow author controls
max-tokensfor the response. There is no path where auto-discovered content can overflow the context window.Comparison
gh aw(Copilot Agentic Workflows)ai-agent-runner(Agent Governance Toolkit).github/artifactscustom-instructionsinput onlyWhy both exist
These tools solve different problems on the same surface (GitHub Actions):
gh awis an autonomous developer. It operates in an agentic loop, can explore the codebase, make multi-file edits, run tests, and commit. It is the right tool when you need the agent to do work — implement a feature, fix a bug, refactor code. The auto-discovery of instructions makes sense conceptually: the agent should follow the repo's coding standards.ai-agent-runneris a smart reviewer/commenter. It makes a single LLM call with curated context and posts the result. It is the right tool when you need an agent to analyze and respond — review a PR, triage an issue, scan for security patterns. It does not need the full instruction set because it is not writing code.The tensions
1. Auto-discovery is the right idea with the wrong implementation
Auto-discovery of instructions is valuable — workflow authors should not have to manually replicate what VS Code already does in the IDE. The problem is that the current implementation loads everything unconditionally, without respect for
applyTopatterns, relevance, or token budgets. Theai-agent-runneravoids this by not doing auto-discovery at all, but that is a workaround, not a solution.2. "What files the agent can access" vs. "What instructions the agent should follow"
gh awconflates these two concerns throughcheckout: sparse-checkout. The only way to prevent instruction files from being auto-discovered is to prevent them from existing on disk — which also prevents the agent from reading them if needed during its work. Theai-agent-runnerhas no such conflation because it never reads from disk.3. Single-turn vs. agentic loop
The
ai-agent-runnerworks within token limits precisely because it is single-turn — there is no tool-use loop, no intermediate context, no accumulated state. But this also limits what it can do. You cannot ask it to implement a feature. The question is whether agentic workflows can adopt some of theai-agent-runner's explicit instruction management without sacrificing agentic capabilities.4. Scaling with repository maturity
The
ai-agent-runnerscales indefinitely because it only loads what you give it. Agentic workflows scale inversely with repo maturity — the better-organized your repo's instructions, the more likely you are to hit the token ceiling. Enterprise repositories with multiple teams contributing standards will hit this wall early.Questions for discussion
gh awadopt explicit instruction filtering (likeai-agent-runner'scustom-instructionsinput) alongside auto-discovery?gh awrespectapplyTopatterns from instruction frontmatter to load only relevant instructions per workflow?ai-agent-runnerapproach be considered a first-class pattern for "lightweight agent" use cases in agentic workflows?Related
gh aw(our filed issue)ai-agent-runneraction sourceBeta Was this translation helpful? Give feedback.
All reactions