-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat: add external agent support for code reviews #1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add external agent support for code reviews #1061
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
Great idea - a few things! I am about to switch all of the workflows to not use workflow.xml to the sharded steps. So for any command we want to give to an external agent, it sounds like we will just want a clear separate text file prompt that is self contained also in a single file to give to the agent? This could be used for much more than just the code review. When installing, the user already has a multi-select for Tools they have installed. This could be just a single question then in bmm if they want to allow supported installed tools do external command executions. Also saying it this way, as this could really be used for much more than code review, pretty much review of anything could potentially produce better results. What combinations of tools and OS was this tested on? Random idea - what if we had a folder of named prompts to pass to other agents, maybe call them external workflows. The could all register as installed tool external slash commands. so if the folder had external-code-review.md, for each tool installed it could create a slash command. /bmad-bmm-ext-copilot-code-review. While this could result in a lot for users to remember, all they would need to know is the tool and autocomplete will show them the options. so if in claude, and started to type /copilot - the autocomplete would show all of the possible commands. |
|
The choice of alt agent should belong with the user. I love this capability in principle. But I also wonder, how many people would frown upon the idea of BMAD deciding to run some agentic software in YOLO mode on the computer without asking? What if we defined a location like |
|
it sounds like we will just want a clear separate text file prompt that is self contained also in a single file to give to the agent? It's already a plaintext prompt, I'll break it out into a text file. When installing, the user already has a multi-select for Tools they have installed. This could be just a single question then in bmm if they want to allow supported installed tools do external command executions. I'll see if I can get it integrated to the installer as a choice. I need to update that anyway to work with v15. The choice of alt agent should belong with the user. Absolutely. It's defined in a config file right now. Could use a selector in the installer, perhaps. What if we defined a location like .bmad/alt-agent.[md|sh]? Right now it's defined in config.yaml, this could be moved to its own file fairly easily. how many people would frown upon the idea of BMAD deciding to run some agentic software in YOLO mode on the computer without asking? This is an excellent point and one I agree with. By its nature a code review should be non-destructive - and I'll add to the prompt emphasis that the external agent should not write/change/delete any file. However to run as an external agent in a headless context the YOLO mode is necessary. Code reviews should be the safest application of this, but it also assumes the user is running in a fairly hardened/backed up environment and the LLM is not going to go off the rails and delete everything it sees. Neither of these are safe assumptions! https://bsky.app/profile/brokentoys.social/post/3m7h4ul6dsc2u this could really be used for much more than code review, pretty much review of anything could potentially produce better results. The problem here is that external agents have to run in yolo mode with this approach - for anything that has a write application, it would require a dozen warnings and they would not be enough and someone WILL come screaming into the support discord that we destroyed their computer. What combinations of tools and OS was this tested on? Just mac os, claude for the primary agent, and codex, gemini and claude for external agents. This is primarily crafted to deal with a frustration I had with invoking repeated code reviews in codex struggling with codex's often-broken slash command support. I would not be opposed to having this functionality be in an optional module. I don't know if a core workflow change can be done via a modular approach currently. I'll update shortly with discussed changes and updates to maintain functionality with current version. |
Adds support for delegating adversarial code reviews to external CLI agents (Codex, Gemini, or Claude) when available. This provides independent, unbiased code reviews from a different AI model. Changes: - Add invoke-bash and set-var tags to workflow.xml execution engine - Add external_review_agents configuration to install-config.yaml - Rewrite code-review workflow to detect and invoke external agents - Cache agent detection in config.yaml to avoid repeated CLI checks - Add fallback to built-in review if external agents unavailable/fail - Update checklist to reflect new external agent workflow External agent invocation: - Codex: codex exec --full-auto "prompt" - Gemini: gemini -p "prompt" --yolo - Claude: claude -p "prompt" --dangerously-skip-permissions
- Add installer prompt for external review agent selection (Codex, Gemini, Claude, None) - Extract adversarial review prompt to external-agent-prompt.md for easier maintenance - Simplify detection logic: check for "none" first, then verify CLI availability - Add read-only sandbox flag to Codex invocation for safety - Update workflow.yaml to reference new config variable
b4c185c to
dcba8e5
Compare
|
Changes in most recent commit:
|
Allows users to customize the config location via bmad_folder setting.
|
Also a note: I've started using a Codex skill for claude code - Claude will use that automatically instead of the specified invocation (which is fine, if a user installs a skill for this they already have opinonated preferences). Wondering if it would be helpful to embed? (Probably not since it's Claude Code specific) |
Add instructions to detect and terminate any stale external agent review processes when starting a new code review session in Claude Code.
|
Added an attempt at a fix for a recently introduced claude code annoyance: it will now sometimes decide to launch bash tasks (like these code reviews) as background processes, and then reconnect to them for the next code review, assume that they are a new code review instead of the one previously completed, and happily report that the code review is only reporting old issues. Well of course it is, claude, you read the old review. Sigh. Anyway this should hopefully fix it (it's difficult to test because it only happens occasionally) |
|
I also don't think this behavior will persist long term because it eats your context alive monitoring the results of a shell over... and over... and over... and over... |
Change cleanup logic from "check and maybe kill" to "kill unconditionally". The previous approach allowed agents to read stale output from lingering external agent processes, causing context pollution and reuse of old data. New behavior: - Kill ALL background shells as the FIRST action - Do NOT read output from stale shells (BashOutput forbidden) - Do NOT check what processes are doing - just terminate them - Only proceed to Step 1 after all shells are dead
|
sigh. another fix enforcing this behaviour because claude code thinks it knows best. "Wait! Let me use this data before I kill it!" |
|
@sjennings thanks for the updates and sorry for delay in further response, I have been traveling and availability has been limited. I like the idea of this being an installer question configuration btw. I will look at all of the PR updates soon and try this out also - might be another day or two just due to the travel and getting situated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for delegating code reviews to external AI CLI tools (Codex, Gemini, or Claude) to provide independent reviews from different AI models. The implementation includes agent detection, mandatory delegation when available, and fallback to built-in review capabilities.
Key Changes:
- Added
invoke-bashandset-varexecution capabilities to the workflow engine - Implemented external agent detection and selection logic with runtime configuration
- Created external agent invocation system with specific CLI commands for each agent
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| workflow.yaml | Adds external_review_agent configuration variable and external-agent-prompt.md file reference |
| instructions.xml | Major rewrite adding external agent detection, CLI invocation logic, and fallback mechanisms with mandatory delegation rules |
| external-agent-prompt.md | New template file defining the prompt to be sent to external agents with placeholders for review context |
| checklist.md | Updates validation checklist to include external agent detection steps and verification points |
| module.yaml | Adds external_review_agent configuration option with single-select menu for agent selection |
| workflow.xml | Documents new invoke-bash and set-var XML tags for workflow execution |
Comments suppressed due to low confidence (1)
src/modules/bmm/workflows/4-implementation/code-review/instructions.xml:175
- Step 2 is described as "Build review attack plan" but when an external agent is used, this step's actions are performed but then ignored since the external agent does its own planning. This creates wasted work. Consider making Step 2 conditional - only execute it when using built-in review ({{use_external_agent}} == false), or move the external agent detection and invocation to happen before Step 2.
<step n="2" goal="Build review attack plan">
<action>Extract ALL Acceptance Criteria from story</action>
<action>Extract ALL Tasks/Subtasks with completion status ([x] vs [ ])</action>
<action>From Dev Agent Record → File List, compile list of claimed changes</action>
<action>Create review plan:
1. **AC Validation**: Verify each AC is actually implemented
2. **Task Audit**: Verify each [x] task is really done
3. **Code Quality**: Security, performance, maintainability
4. **Test Quality**: Real tests vs placeholder bullshit
</action>
</step>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <invoke-bash cmd="codex exec --sandbox read-only --full-auto "$(cat '{{external_prompt_file}}')"" timeout="300000" /> | ||
| </check> | ||
| <check if="{{external_agent_cmd}} == 'gemini'"> | ||
| <critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical> | ||
| <invoke-bash cmd="gemini -p "$(cat '{{external_prompt_file}}')" --yolo" timeout="300000" /> | ||
| </check> | ||
| <check if="{{external_agent_cmd}} == 'claude'"> | ||
| <critical>CLAUDE: Use claude -p with prompt from file</critical> | ||
| <invoke-bash cmd="claude -p "$(cat '{{external_prompt_file}}')" --dangerously-skip-permissions" timeout="300000" /> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout value is set to 300000 milliseconds (5 minutes). For complex codebases with comprehensive reviews, this may be insufficient and could cause premature timeout failures, triggering unnecessary fallback to built-in review. Consider increasing the timeout or making it configurable, especially since the PR mentions that "long prompts are expected and supported".
|
|
||
| <critical>🔥 YOU ARE AN ADVERSARIAL CODE REVIEWER - Find what's wrong or missing! 🔥</critical> | ||
| <!-- ================================================================ --> | ||
| <!-- CLAUDE CODE CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS --> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading comment states "CLAUDE CODE CLEANUP" which is misleading since this is a general code review workflow that could be executed by any agent (Codex, Gemini, Claude, or others). The heading should be more generic, such as "SHELL CLEANUP" or "BACKGROUND PROCESS CLEANUP" to accurately reflect that this is an environmental cleanup step, not Claude-specific functionality.
| <!-- CLAUDE CODE CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS --> | |
| <!-- SHELL CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS --> |
|
|
||
| <!-- External agent prompt is loaded from external-agent-prompt.md --> | ||
| <set-var name="external_prompt_file" value="{installed_path}/external-agent-prompt.md" /> | ||
| <action>Load {{external_prompt_file}} content into {{external_prompt}}</action> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The action loads content into a variable but this variable is never actually used. The invoke-bash commands on lines 210, 214, and 218 use $(cat '{{external_prompt_file}}') to read the file directly, bypassing the loaded variable. Either remove this unused action or modify the bash commands to use the pre-loaded variable content.
| <action>Load {{external_prompt_file}} content into {{external_prompt}}</action> |
| <check if="{{bash_exit_code}} != 0 OR {{bash_stdout}} is empty"> | ||
| <output>⚠️ External agent CLI failed (exit code: {{bash_exit_code}}), falling back to built-in review</output> | ||
| <output>Error: {{bash_stderr}}</output> | ||
| <set-var name="use_external_agent" value="false" /> | ||
| <set-var name="external_agent_failed" value="true" /> | ||
| </check> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling only checks for non-zero exit code or empty stdout, but doesn't handle cases where the CLI tool hangs, produces partial output, or outputs to stderr instead of stdout. Consider adding more robust error detection, such as checking stderr for error messages or validating that the output contains the expected format markers.
| <invoke-bash cmd="codex exec --sandbox read-only --full-auto "$(cat '{{external_prompt_file}}')"" timeout="300000" /> | ||
| </check> | ||
| <check if="{{external_agent_cmd}} == 'gemini'"> | ||
| <critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical> | ||
| <invoke-bash cmd="gemini -p "$(cat '{{external_prompt_file}}')" --yolo" timeout="300000" /> | ||
| </check> | ||
| <check if="{{external_agent_cmd}} == 'claude'"> | ||
| <critical>CLAUDE: Use claude -p with prompt from file</critical> | ||
| <invoke-bash cmd="claude -p "$(cat '{{external_prompt_file}}')" --dangerously-skip-permissions" timeout="300000" /> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commands use $(cat '{{external_prompt_file}}') which reads the entire file content into a command-line argument. For large prompts (as mentioned in the prohibited rationalizations), this could hit shell command-line length limits (typically 128KB-2MB depending on the system). Consider using file redirection or piping the file directly to the CLI if the tools support stdin, e.g., codex exec --full-auto < "{{external_prompt_file}}" or passing the file path as an argument if the CLIs support it.
| <invoke-bash cmd="codex exec --sandbox read-only --full-auto "$(cat '{{external_prompt_file}}')"" timeout="300000" /> | |
| </check> | |
| <check if="{{external_agent_cmd}} == 'gemini'"> | |
| <critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical> | |
| <invoke-bash cmd="gemini -p "$(cat '{{external_prompt_file}}')" --yolo" timeout="300000" /> | |
| </check> | |
| <check if="{{external_agent_cmd}} == 'claude'"> | |
| <critical>CLAUDE: Use claude -p with prompt from file</critical> | |
| <invoke-bash cmd="claude -p "$(cat '{{external_prompt_file}}')" --dangerously-skip-permissions" timeout="300000" /> | |
| <invoke-bash cmd="codex exec --sandbox read-only --full-auto < '{{external_prompt_file}}'" timeout="300000" /> | |
| </check> | |
| <check if="{{external_agent_cmd}} == 'gemini'"> | |
| <critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical> | |
| <invoke-bash cmd="gemini -p '{{external_prompt_file}}' --yolo" timeout="300000" /> | |
| </check> | |
| <check if="{{external_agent_cmd}} == 'claude'"> | |
| <critical>CLAUDE: Use claude -p with prompt from file</critical> | |
| <invoke-bash cmd="claude -p '{{external_prompt_file}}' --dangerously-skip-permissions" timeout="300000" /> |
| STORY CONTEXT: {{story_path}} | ||
| FILES TO REVIEW: {{comprehensive_file_list}} | ||
| ACCEPTANCE CRITERIA: {{acceptance_criteria_list}} | ||
| TASKS: {{task_list}} | ||
|
|
||
| OUTPUT FORMAT: | ||
|
|
||
| ## HIGH SEVERITY | ||
|
|
||
| - [file:line] Issue description | Suggested fix | ||
|
|
||
| ## MEDIUM SEVERITY | ||
|
|
||
| - [file:line] Issue description | Suggested fix | ||
|
|
||
| ## LOW SEVERITY | ||
|
|
||
| - [file:line] Issue description | Suggested fix |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The external prompt file contains template variables ({{story_path}}, {{comprehensive_file_list}}, {{acceptance_criteria_list}}, {{task_list}}) that need to be populated before being passed to the external agent. However, there's no action to interpolate these variables before the file is read by the cat command. The external agent will receive literal "{{variable}}" strings instead of actual values, making the prompt incomplete and ineffective.
| <critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical> | ||
| <critical>Before ANY other workflow step, you MUST terminate all background processes.</critical> | ||
| <critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical> | ||
| <critical>DO NOT try to "check what they're doing" - just kill them.</critical> | ||
|
|
||
| <check if="BMAD is running as Claude Code (KillShell tool is available)"> | ||
| <action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action> | ||
| <action>Do NOT read output from these shells first - kill them blind</action> | ||
| <action>Do NOT rationalize keeping them alive - they contain stale data</action> | ||
| <action>If multiple shells exist, kill ALL of them one by one</action> | ||
| <output>🔪 Killed all background shells to prevent stale data contamination</output> | ||
| </check> | ||
| <critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 14-26 describe killing background shells which is specific to Claude Code's KillShell tool. However, this code review workflow could be executed by any AI agent (including Codex or Gemini as mentioned in the PR). These instructions and the conditional check for "Claude Code" will be confusing or irrelevant when executed by other agents. Consider either removing this section or making it more generic/conditional based on the executing agent.
| <critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical> | |
| <critical>Before ANY other workflow step, you MUST terminate all background processes.</critical> | |
| <critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical> | |
| <critical>DO NOT try to "check what they're doing" - just kill them.</critical> | |
| <check if="BMAD is running as Claude Code (KillShell tool is available)"> | |
| <action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action> | |
| <action>Do NOT read output from these shells first - kill them blind</action> | |
| <action>Do NOT rationalize keeping them alive - they contain stale data</action> | |
| <action>If multiple shells exist, kill ALL of them one by one</action> | |
| <output>🔪 Killed all background shells to prevent stale data contamination</output> | |
| </check> | |
| <critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical> | |
| <check if="BMAD is running as Claude Code (KillShell tool is available)"> | |
| <critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical> | |
| <critical>Before ANY other workflow step, you MUST terminate all background processes.</critical> | |
| <critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical> | |
| <critical>DO NOT try to "check what they're doing" - just kill them.</critical> | |
| <action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action> | |
| <action>Do NOT read output from these shells first - kill them blind</action> | |
| <action>Do NOT rationalize keeping them alive - they contain stale data</action> | |
| <action>If multiple shells exist, kill ALL of them one by one</action> | |
| <output>🔪 Killed all background shells to prevent stale data contamination</output> | |
| <critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical> | |
| </check> |
|
@sjennings, @bmadcode here is a simple thought: What do we call a piece knowledge, user- or project-specific, about executing code reviews in ways that a user knows and approves of? Hint: 7 letters, starts with the C, ends with 'ontext'. What if this wants to be a workflow a la 'setup-reviews', that guides the user through creation of
|
|
@sjennings I love what you've done with the place. Don't mind me stealing half of this diff to use for work? :) |
|
Feel free! It was your suggestion to dive in and then submit a PR that prompted my sharing it. As for the changes in the Copilot review I'll look into them more closely later. |
|
Here is my version of a small step in the same general direction: |
|
@sjennings |
- Rename misleading "CLAUDE CODE CLEANUP" comment to generic "SHELL CLEANUP" - Remove unused external_prompt variable loading action - Increase external agent timeout from 5 to 10 minutes for complex reviews - Fix Review Method output to show actual method used instead of "OR" - Restore web_bundle: false setting needed by web bundler
External agents find ALL possible issues, but not all are relevant. This new step 4 evaluates findings against project context: - Actually Important: AC violations, bugs, real security issues - Context-Dependent: Performance, validation strictness (asks user) - Theoretical/Nitpicking: Micro-optimizations, style preferences (skips) This enables pragmatic code review that focuses on what matters.
|
Closing for now since @alexeyv is working on an alternate implementation |
Summary
Adds support for delegating adversarial code reviews to external CLI agents (Codex, Gemini, or Claude) when available. This provides independent, unbiased code reviews from a different AI model.
Key Features
Changes
invoke-bashandset-varexecution tags to the workflow engineexternal_review_agentsconfiguration section with preferred agent selectionExternal Agent Invocation Commands
codex exec --full-auto "prompt"gemini -p "prompt" --yoloclaude -p "prompt" --dangerously-skip-permissionsTest plan