feat: add external agent support for code reviews #1061

sjennings · 2025-12-08T01:19:55Z

Summary

Adds support for delegating adversarial code reviews to external CLI agents (Codex, Gemini, or Claude) when available. This provides independent, unbiased code reviews from a different AI model.

Key Features

External Agent Detection: Auto-detects available CLI tools (codex, gemini, claude) and caches results in config.yaml
Mandatory Delegation: When an external agent is available, the workflow MUST delegate to it (with explicit prohibited rationalizations to prevent skipping)
Fallback Support: Falls back to built-in review only if all external agents fail detection or invocation
Config Caching: Detection results are cached to avoid repeated CLI checks on subsequent runs

Changes

workflow.xml: Added invoke-bash and set-var execution tags to the workflow engine
install-config.yaml: Added external_review_agents configuration section with preferred agent selection
code-review/instructions.xml: Complete rewrite to support external agent detection, invocation, and fallback
code-review/workflow.yaml: Simplified to use runtime variables for agent detection
code-review/checklist.md: Updated to reflect new external agent workflow

External Agent Invocation Commands

Agent	Command
Codex	`codex exec --full-auto "prompt"`
Gemini	`gemini -p "prompt" --yolo`
Claude	`claude -p "prompt" --dangerously-skip-permissions`

Test plan

Install BMAD to a project with codex CLI available
Run code-review workflow and verify it detects and invokes codex
Verify config.yaml is updated with detection results
Test with gemini/claude CLI if available
Test fallback to built-in review when no external agents available

coderabbitai · 2025-12-08T01:20:01Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bmadcode · 2025-12-08T13:53:36Z

Great idea - a few things!

I am about to switch all of the workflows to not use workflow.xml to the sharded steps. So for any command we want to give to an external agent, it sounds like we will just want a clear separate text file prompt that is self contained also in a single file to give to the agent? This could be used for much more than just the code review.

When installing, the user already has a multi-select for Tools they have installed. This could be just a single question then in bmm if they want to allow supported installed tools do external command executions. Also saying it this way, as this could really be used for much more than code review, pretty much review of anything could potentially produce better results.

What combinations of tools and OS was this tested on?

Random idea - what if we had a folder of named prompts to pass to other agents, maybe call them external workflows. The could all register as installed tool external slash commands. so if the folder had external-code-review.md, for each tool installed it could create a slash command. /bmad-bmm-ext-copilot-code-review. While this could result in a lot for users to remember, all they would need to know is the tool and autocomplete will show them the options. so if in claude, and started to type /copilot - the autocomplete would show all of the possible commands.

alexeyv · 2025-12-08T14:31:17Z

The choice of alt agent should belong with the user.

I love this capability in principle. But I also wonder, how many people would frown upon the idea of BMAD deciding to run some agentic software in YOLO mode on the computer without asking?

What if we defined a location like .bmad/alt-agent.[md|sh]? If it exists, run review through it. If it doesn't, stop and ask. We would then take all this code from this PR, put it into examples of alt agent. And we could also have a "there is no alt agent" version of alt-agent.md For when you don't want to be asked, and you don't want to be delegating it to some other LLM.

sjennings · 2025-12-08T15:08:54Z

it sounds like we will just want a clear separate text file prompt that is self contained also in a single file to give to the agent?

It's already a plaintext prompt, I'll break it out into a text file.

When installing, the user already has a multi-select for Tools they have installed. This could be just a single question then in bmm if they want to allow supported installed tools do external command executions.

I'll see if I can get it integrated to the installer as a choice. I need to update that anyway to work with v15.

The choice of alt agent should belong with the user.

Absolutely. It's defined in a config file right now. Could use a selector in the installer, perhaps.

What if we defined a location like .bmad/alt-agent.[md|sh]?

Right now it's defined in config.yaml, this could be moved to its own file fairly easily.

how many people would frown upon the idea of BMAD deciding to run some agentic software in YOLO mode on the computer without asking?

This is an excellent point and one I agree with. By its nature a code review should be non-destructive - and I'll add to the prompt emphasis that the external agent should not write/change/delete any file. However to run as an external agent in a headless context the YOLO mode is necessary. Code reviews should be the safest application of this, but it also assumes the user is running in a fairly hardened/backed up environment and the LLM is not going to go off the rails and delete everything it sees. Neither of these are safe assumptions! https://bsky.app/profile/brokentoys.social/post/3m7h4ul6dsc2u

this could really be used for much more than code review, pretty much review of anything could potentially produce better results.

The problem here is that external agents have to run in yolo mode with this approach - for anything that has a write application, it would require a dozen warnings and they would not be enough and someone WILL come screaming into the support discord that we destroyed their computer.

What combinations of tools and OS was this tested on?

Just mac os, claude for the primary agent, and codex, gemini and claude for external agents. This is primarily crafted to deal with a frustration I had with invoking repeated code reviews in codex struggling with codex's often-broken slash command support.

I would not be opposed to having this functionality be in an optional module. I don't know if a core workflow change can be done via a modular approach currently.

I'll update shortly with discussed changes and updates to maintain functionality with current version.

Adds support for delegating adversarial code reviews to external CLI agents (Codex, Gemini, or Claude) when available. This provides independent, unbiased code reviews from a different AI model. Changes: - Add invoke-bash and set-var tags to workflow.xml execution engine - Add external_review_agents configuration to install-config.yaml - Rewrite code-review workflow to detect and invoke external agents - Cache agent detection in config.yaml to avoid repeated CLI checks - Add fallback to built-in review if external agents unavailable/fail - Update checklist to reflect new external agent workflow External agent invocation: - Codex: codex exec --full-auto "prompt" - Gemini: gemini -p "prompt" --yolo - Claude: claude -p "prompt" --dangerously-skip-permissions

- Add installer prompt for external review agent selection (Codex, Gemini, Claude, None) - Extract adversarial review prompt to external-agent-prompt.md for easier maintenance - Simplify detection logic: check for "none" first, then verify CLI availability - Add read-only sandbox flag to Codex invocation for safety - Update workflow.yaml to reference new config variable

sjennings · 2025-12-08T17:02:06Z

Changes in most recent commit:

Now fully compatible with alpha 15
Installer support. User can select from Codex, Gemini, Claude Code or None. None is listed as default.
Prompt used is now in a markdown file in the workflow directory.
Added language to prompt to try to ensure read-only operations.
The Codex external agent invocation now uses a read-only sandbox. (Need to research if this is as easily supported in Gemini and Claude Code and test both.)

Allows users to customize the config location via bmad_folder setting.

sjennings · 2025-12-08T18:27:46Z

Also a note: I've started using a Codex skill for claude code - Claude will use that automatically instead of the specified invocation (which is fine, if a user installs a skill for this they already have opinonated preferences). Wondering if it would be helpful to embed? (Probably not since it's Claude Code specific)


---
name: codex
description: Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing
---

# Codex Skill Guide

## Running a Task

1. Ask the user (via `AskUserQuestion`) which model to run (`gpt-5.1-codex-max`, `gpt-5.1-codex-mini` or `gpt-5.1`)) AND which reasoning effort to use (`high`, `medium`, or `low`) in a **single prompt with two questions**.
2. Select the sandbox mode required for the task; default to `--sandbox read-only` unless edits or network access are necessary.
3. Assemble the command with the appropriate options:
   - `-m, --model <MODEL>`
   - `--config model_reasoning_effort="<high|medium|low>"`
   - `--sandbox <read-only|workspace-write|danger-full-access>`
   - `--full-auto`
   - `-C, --cd <DIR>`
   - `--skip-git-repo-check`
4. Always use --skip-git-repo-check.
5. When continuing a previous session, use `codex exec --skip-git-repo-check resume --last` via stdin. When resuming don't use any configuration flags unless explicitly requested by the user e.g. if he species the model or the reasoning effort when requesting to resume a session. resume syntax: `echo "your prompt here" | codex exec --skip-git-repo-check resume --last 2>/dev/null`. All flags have to be inserted between exec and resume.
6. **IMPORTANT**: By default, append `2>/dev/null` to all `codex exec` commands to suppress thinking tokens (stderr). Only show stderr if the user explicitly requests to see thinking tokens or if debugging is needed.
7. Run the command, capture stdout/stderr (filtered as appropriate), and summarize the outcome for the user.
8. **After Codex completes**, inform the user: "You can resume this Codex session at any time by saying 'codex resume' or asking me to continue with additional analysis or changes."

### Quick Reference

| Use case | Sandbox mode | Key flags |
| --- | --- | --- |
| Read-only review or analysis | `read-only` | `--sandbox read-only 2>/dev/null` |
| Apply local edits | `workspace-write` | `--sandbox workspace-write --full-auto 2>/dev/null` |
| Permit network or broad access | `danger-full-access` | `--sandbox danger-full-access --full-auto 2>/dev/null` |
| Resume recent session | Inherited from original | `echo "prompt" \| codex exec --skip-git-repo-check resume --last 2>/dev/null` (no flags allowed) |
| Run from another directory | Match task needs | `-C <DIR>` plus other flags `2>/dev/null` |

## Following Up

- After every `codex` command, immediately use `AskUserQuestion` to confirm next steps, collect clarifications, or decide whether to resume with `codex exec resume --last`.
- When resuming, pipe the new prompt via stdin: `echo "new prompt" | codex exec resume --last 2>/dev/null`. The resumed session automatically uses the same model, reasoning effort, and sandbox mode from the original session.
- Restate the chosen model, reasoning effort, and sandbox mode when proposing follow-up actions.

## Error Handling

- Stop and report failures whenever `codex --version` or a `codex exec` command exits non-zero; request direction before retrying.
- Before you use high-impact flags (`--full-auto`, `--sandbox danger-full-access`, `--skip-git-repo-check`) ask the user for permission using AskUserQuestion unless it was already given.
- When output includes warnings or partial results, summarize them and ask how to adjust using `AskUserQuestion`.
`

Add instructions to detect and terminate any stale external agent review processes when starting a new code review session in Claude Code.

sjennings · 2025-12-11T00:54:42Z

Added an attempt at a fix for a recently introduced claude code annoyance: it will now sometimes decide to launch bash tasks (like these code reviews) as background processes, and then reconnect to them for the next code review, assume that they are a new code review instead of the one previously completed, and happily report that the code review is only reporting old issues. Well of course it is, claude, you read the old review. Sigh. Anyway this should hopefully fix it (it's difficult to test because it only happens occasionally)

sjennings · 2025-12-11T00:58:09Z

I also don't think this behavior will persist long term because it eats your context alive monitoring the results of a shell over... and over... and over... and over...

Change cleanup logic from "check and maybe kill" to "kill unconditionally". The previous approach allowed agents to read stale output from lingering external agent processes, causing context pollution and reuse of old data. New behavior: - Kill ALL background shells as the FIRST action - Do NOT read output from stale shells (BashOutput forbidden) - Do NOT check what processes are doing - just terminate them - Only proceed to Step 1 after all shells are dead

sjennings · 2025-12-11T01:07:04Z

sigh. another fix enforcing this behaviour because claude code thinks it knows best. "Wait! Let me use this data before I kill it!"

bmadcode · 2025-12-11T23:04:42Z

@sjennings thanks for the updates and sorry for delay in further response, I have been traveling and availability has been limited. I like the idea of this being an installer question configuration btw.

I will look at all of the PR updates soon and try this out also - might be another day or two just due to the travel and getting situated.

Copilot

Pull request overview

This PR adds support for delegating code reviews to external AI CLI tools (Codex, Gemini, or Claude) to provide independent reviews from different AI models. The implementation includes agent detection, mandatory delegation when available, and fallback to built-in review capabilities.

Key Changes:

Added invoke-bash and set-var execution capabilities to the workflow engine
Implemented external agent detection and selection logic with runtime configuration
Created external agent invocation system with specific CLI commands for each agent

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file

File	Description
workflow.yaml	Adds external_review_agent configuration variable and external-agent-prompt.md file reference
instructions.xml	Major rewrite adding external agent detection, CLI invocation logic, and fallback mechanisms with mandatory delegation rules
external-agent-prompt.md	New template file defining the prompt to be sent to external agents with placeholders for review context
checklist.md	Updates validation checklist to include external agent detection steps and verification points
module.yaml	Adds external_review_agent configuration option with single-select menu for agent selection
workflow.xml	Documents new invoke-bash and set-var XML tags for workflow execution

Comments suppressed due to low confidence (1)

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml:175

Step 2 is described as "Build review attack plan" but when an external agent is used, this step's actions are performed but then ignored since the external agent does its own planning. This creates wasted work. Consider making Step 2 conditional - only execute it when using built-in review ({{use_external_agent}} == false), or move the external agent detection and invocation to happen before Step 2.

  <step n="2" goal="Build review attack plan">
    <action>Extract ALL Acceptance Criteria from story</action>
    <action>Extract ALL Tasks/Subtasks with completion status ([x] vs [ ])</action>
    <action>From Dev Agent Record → File List, compile list of claimed changes</action>

    <action>Create review plan:
      1. **AC Validation**: Verify each AC is actually implemented
      2. **Task Audit**: Verify each [x] task is really done
      3. **Code Quality**: Security, performance, maintainability
      4. **Test Quality**: Real tests vs placeholder bullshit
    </action>
  </step>

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

Copilot · 2025-12-12T19:50:08Z

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

+        <invoke-bash cmd="codex exec --sandbox read-only --full-auto &quot;$(cat '{{external_prompt_file}}')&quot;" timeout="300000" />
+      </check>
+      <check if="{{external_agent_cmd}} == 'gemini'">
+        <critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>
+        <invoke-bash cmd="gemini -p &quot;$(cat '{{external_prompt_file}}')&quot; --yolo" timeout="300000" />
+      </check>
+      <check if="{{external_agent_cmd}} == 'claude'">
+        <critical>CLAUDE: Use claude -p with prompt from file</critical>
+        <invoke-bash cmd="claude -p &quot;$(cat '{{external_prompt_file}}')&quot; --dangerously-skip-permissions" timeout="300000" />


The timeout value is set to 300000 milliseconds (5 minutes). For complex codebases with comprehensive reviews, this may be insufficient and could cause premature timeout failures, triggering unnecessary fallback to built-in review. Consider increasing the timeout or making it configurable, especially since the PR mentions that "long prompts are expected and supported".

Copilot · 2025-12-12T19:50:08Z

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml


-  <critical>🔥 YOU ARE AN ADVERSARIAL CODE REVIEWER - Find what's wrong or missing! 🔥</critical>
+  <!-- ================================================================ -->
+  <!-- CLAUDE CODE CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS   -->


The heading comment states "CLAUDE CODE CLEANUP" which is misleading since this is a general code review workflow that could be executed by any agent (Codex, Gemini, Claude, or others). The heading should be more generic, such as "SHELL CLEANUP" or "BACKGROUND PROCESS CLEANUP" to accurately reflect that this is an environmental cleanup step, not Claude-specific functionality.

Suggested change

Copilot · 2025-12-12T19:50:09Z

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

+
+      <!-- External agent prompt is loaded from external-agent-prompt.md -->
+      <set-var name="external_prompt_file" value="{installed_path}/external-agent-prompt.md" />
+      <action>Load {{external_prompt_file}} content into {{external_prompt}}</action>


The action loads content into a variable but this variable is never actually used. The invoke-bash commands on lines 210, 214, and 218 use $(cat '{{external_prompt_file}}') to read the file directly, bypassing the loaded variable. Either remove this unused action or modify the bash commands to use the pre-loaded variable content.

Suggested change

<action>Load {{external_prompt_file}} content into {{external_prompt}}</action>

src/modules/bmm/workflows/4-implementation/code-review/workflow.yaml

Copilot · 2025-12-12T19:50:12Z

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

+      <check if="{{bash_exit_code}} != 0 OR {{bash_stdout}} is empty">
+        <output>⚠️ External agent CLI failed (exit code: {{bash_exit_code}}), falling back to built-in review</output>
+        <output>Error: {{bash_stderr}}</output>
+        <set-var name="use_external_agent" value="false" />
+        <set-var name="external_agent_failed" value="true" />
+      </check>


The error handling only checks for non-zero exit code or empty stdout, but doesn't handle cases where the CLI tool hangs, produces partial output, or outputs to stderr instead of stdout. Consider adding more robust error detection, such as checking stderr for error messages or validating that the output contains the expected format markers.

Copilot · 2025-12-12T19:50:12Z

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

+        <invoke-bash cmd="codex exec --sandbox read-only --full-auto &quot;$(cat '{{external_prompt_file}}')&quot;" timeout="300000" />
+      </check>
+      <check if="{{external_agent_cmd}} == 'gemini'">
+        <critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>
+        <invoke-bash cmd="gemini -p &quot;$(cat '{{external_prompt_file}}')&quot; --yolo" timeout="300000" />
+      </check>
+      <check if="{{external_agent_cmd}} == 'claude'">
+        <critical>CLAUDE: Use claude -p with prompt from file</critical>
+        <invoke-bash cmd="claude -p &quot;$(cat '{{external_prompt_file}}')&quot; --dangerously-skip-permissions" timeout="300000" />


The commands use $(cat '{{external_prompt_file}}') which reads the entire file content into a command-line argument. For large prompts (as mentioned in the prohibited rationalizations), this could hit shell command-line length limits (typically 128KB-2MB depending on the system). Consider using file redirection or piping the file directly to the CLI if the tools support stdin, e.g., codex exec --full-auto < "{{external_prompt_file}}" or passing the file path as an argument if the CLIs support it.

Suggested change

<invoke-bash cmd="codex exec --sandbox read-only --full-auto "$(cat '{{external_prompt_file}}')"" timeout="300000" />

</check>

<check if="{{external_agent_cmd}} == 'gemini'">

<critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>

<invoke-bash cmd="gemini -p "$(cat '{{external_prompt_file}}')" --yolo" timeout="300000" />

</check>

<check if="{{external_agent_cmd}} == 'claude'">

<critical>CLAUDE: Use claude -p with prompt from file</critical>

<invoke-bash cmd="claude -p "$(cat '{{external_prompt_file}}')" --dangerously-skip-permissions" timeout="300000" />

<invoke-bash cmd="codex exec --sandbox read-only --full-auto < '{{external_prompt_file}}'" timeout="300000" />

</check>

<check if="{{external_agent_cmd}} == 'gemini'">

<critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>

<invoke-bash cmd="gemini -p '{{external_prompt_file}}' --yolo" timeout="300000" />

</check>

<check if="{{external_agent_cmd}} == 'claude'">

<critical>CLAUDE: Use claude -p with prompt from file</critical>

<invoke-bash cmd="claude -p '{{external_prompt_file}}' --dangerously-skip-permissions" timeout="300000" />

Copilot · 2025-12-12T19:50:13Z

src/modules/bmm/workflows/4-implementation/code-review/external-agent-prompt.md

+STORY CONTEXT: {{story_path}}
+FILES TO REVIEW: {{comprehensive_file_list}}
+ACCEPTANCE CRITERIA: {{acceptance_criteria_list}}
+TASKS: {{task_list}}
+
+OUTPUT FORMAT:
+
+## HIGH SEVERITY
+
+- [file:line] Issue description | Suggested fix
+
+## MEDIUM SEVERITY
+
+- [file:line] Issue description | Suggested fix
+
+## LOW SEVERITY
+
+- [file:line] Issue description | Suggested fix


The external prompt file contains template variables ({{story_path}}, {{comprehensive_file_list}}, {{acceptance_criteria_list}}, {{task_list}}) that need to be populated before being passed to the external agent. However, there's no action to interpolate these variables before the file is read by the cat command. The external agent will receive literal "{{variable}}" strings instead of actual values, making the prompt incomplete and ineffective.

Copilot · 2025-12-12T19:50:13Z

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

+  <critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical>
+  <critical>Before ANY other workflow step, you MUST terminate all background processes.</critical>
+  <critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical>
+  <critical>DO NOT try to "check what they're doing" - just kill them.</critical>
+
+  <check if="BMAD is running as Claude Code (KillShell tool is available)">
+    <action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action>
+    <action>Do NOT read output from these shells first - kill them blind</action>
+    <action>Do NOT rationalize keeping them alive - they contain stale data</action>
+    <action>If multiple shells exist, kill ALL of them one by one</action>
+    <output>🔪 Killed all background shells to prevent stale data contamination</output>
+  </check>
+  <critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical>


Lines 14-26 describe killing background shells which is specific to Claude Code's KillShell tool. However, this code review workflow could be executed by any AI agent (including Codex or Gemini as mentioned in the PR). These instructions and the conditional check for "Claude Code" will be confusing or irrelevant when executed by other agents. Consider either removing this section or making it more generic/conditional based on the executing agent.

Suggested change

<critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical>

<critical>Before ANY other workflow step, you MUST terminate all background processes.</critical>

<critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical>

<critical>DO NOT try to "check what they're doing" - just kill them.</critical>

<check if="BMAD is running as Claude Code (KillShell tool is available)">

<action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action>

<action>Do NOT read output from these shells first - kill them blind</action>

<action>Do NOT rationalize keeping them alive - they contain stale data</action>

<action>If multiple shells exist, kill ALL of them one by one</action>

<output>🔪 Killed all background shells to prevent stale data contamination</output>

</check>

<critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical>

<check if="BMAD is running as Claude Code (KillShell tool is available)">

<critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical>

<critical>Before ANY other workflow step, you MUST terminate all background processes.</critical>

<critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical>

<critical>DO NOT try to "check what they're doing" - just kill them.</critical>

<action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action>

<action>Do NOT read output from these shells first - kill them blind</action>

<action>Do NOT rationalize keeping them alive - they contain stale data</action>

<action>If multiple shells exist, kill ALL of them one by one</action>

<output>🔪 Killed all background shells to prevent stale data contamination</output>

<critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical>

</check>

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml

alexeyv · 2025-12-12T23:02:52Z

@sjennings, @bmadcode here is a simple thought:

What do we call a piece knowledge, user- or project-specific, about executing code reviews in ways that a user knows and approves of? Hint: 7 letters, starts with the C, ends with 'ontext'.

What if this wants to be a workflow a la 'setup-reviews', that guides the user through creation of docs/how-to-review.yaml? That how-to-review may say things like "Use Bash to call CodeRabbit", "Use the Task tool to conduct adversarial review", "Use Bash to call codex and pass it this prompt:..." We really don't care what it says. We just suggest to the user what it might say and ask questions. As long as the output is shaped as a list(-s) of review findings

code-review then becomes an orchestrator:

assemble a diff of changes under review
If how-to-review exists. read and follow it.
If it doesn't, either
- abort and tell the user to run setup-reviews, or
- Fall back to what code-review is currently doing, plus maybe the adversarial bit2qw
Put all findings together, deduplicate them, assess their severity and what's real and what's not, ordered by severity, and present to the user. Back to what it already knows how to do: dealing with the findings.

alexeyv · 2025-12-12T23:12:50Z

@sjennings I love what you've done with the place. Don't mind me stealing half of this diff to use for work? :)

sjennings · 2025-12-12T23:21:55Z

Feel free! It was your suggestion to dive in and then submit a PR that prompted my sharing it.

As for the changes in the Copilot review I'll look into them more closely later.

alexeyv · 2025-12-12T23:28:17Z

Here is my version of a small step in the same general direction:
#1114

alexeyv · 2025-12-12T23:31:40Z

@sjennings
If we end up going with how-to-review.yaml, better part of what you have here becomes the things setup-review suggests.
/creative raving mode

- Rename misleading "CLAUDE CODE CLEANUP" comment to generic "SHELL CLEANUP" - Remove unused external_prompt variable loading action - Increase external agent timeout from 5 to 10 minutes for complex reviews - Fix Review Method output to show actual method used instead of "OR" - Restore web_bundle: false setting needed by web bundler

External agents find ALL possible issues, but not all are relevant. This new step 4 evaluates findings against project context: - Actually Important: AC violations, bugs, real security issues - Context-Dependent: Performance, validation strictness (asks user) - Theoretical/Nitpicking: Micro-optimizations, style preferences (skips) This enables pragmatic code review that focuses on what matters.

sjennings · 2025-12-20T17:08:48Z

Closing for now since @alexeyv is working on an alternate implementation

Scott Jennings added 2 commits December 8, 2025 10:36

sjennings force-pushed the feat/external-agent-code-review branch from b4c185c to dcba8e5 Compare December 8, 2025 16:57

fix: use bmad_folder variable instead of hardcoded .bmad path

c0a49bc

Allows users to customize the config location via bmad_folder setting.

feat: add cleanup for lingering external agent processes

4c6b492

Add instructions to detect and terminate any stale external agent review processes when starting a new code review session in Claude Code.

alexeyv and others added 2 commits December 10, 2025 19:07

Merge branch 'main' into feat/external-agent-code-review

f7311f8

Merge branch 'main' into feat/external-agent-code-review

7864606

Merge branch 'main' into feat/external-agent-code-review

a88bb43

muratkeremozcan requested a review from Copilot December 12, 2025 19:40

Copilot started reviewing on behalf of muratkeremozcan December 12, 2025 19:41 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Scott Jennings added 2 commits December 13, 2025 07:17

sjennings closed this Dec 20, 2025

	<!-- CLAUDE CODE CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS -->
	<!-- SHELL CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS -->

Uh oh!

feat: add external agent support for code reviews #1061

feat: add external agent support for code reviews #1061

Conversation

sjennings commented Dec 8, 2025

Summary

Key Features

Changes

External Agent Invocation Commands

Test plan

Uh oh!

coderabbitai bot commented Dec 8, 2025

Review skipped

Uh oh!

bmadcode commented Dec 8, 2025

Uh oh!

alexeyv commented Dec 8, 2025

Uh oh!

sjennings commented Dec 8, 2025

Uh oh!

sjennings commented Dec 8, 2025

Uh oh!

sjennings commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjennings commented Dec 11, 2025

Uh oh!

sjennings commented Dec 11, 2025

Uh oh!

sjennings commented Dec 11, 2025

Uh oh!

bmadcode commented Dec 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexeyv commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexeyv commented Dec 12, 2025

Uh oh!

sjennings commented Dec 12, 2025

Uh oh!

alexeyv commented Dec 12, 2025

Uh oh!

alexeyv commented Dec 12, 2025

Uh oh!

sjennings commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

sjennings commented Dec 8, 2025 •

edited

Loading

alexeyv commented Dec 12, 2025 •

edited

Loading