Skip to content

Conversation

@sjennings
Copy link
Contributor

Summary

Adds support for delegating adversarial code reviews to external CLI agents (Codex, Gemini, or Claude) when available. This provides independent, unbiased code reviews from a different AI model.

Key Features

  • External Agent Detection: Auto-detects available CLI tools (codex, gemini, claude) and caches results in config.yaml
  • Mandatory Delegation: When an external agent is available, the workflow MUST delegate to it (with explicit prohibited rationalizations to prevent skipping)
  • Fallback Support: Falls back to built-in review only if all external agents fail detection or invocation
  • Config Caching: Detection results are cached to avoid repeated CLI checks on subsequent runs

Changes

  • workflow.xml: Added invoke-bash and set-var execution tags to the workflow engine
  • install-config.yaml: Added external_review_agents configuration section with preferred agent selection
  • code-review/instructions.xml: Complete rewrite to support external agent detection, invocation, and fallback
  • code-review/workflow.yaml: Simplified to use runtime variables for agent detection
  • code-review/checklist.md: Updated to reflect new external agent workflow

External Agent Invocation Commands

Agent Command
Codex codex exec --full-auto "prompt"
Gemini gemini -p "prompt" --yolo
Claude claude -p "prompt" --dangerously-skip-permissions

Test plan

  • Install BMAD to a project with codex CLI available
  • Run code-review workflow and verify it detects and invokes codex
  • Verify config.yaml is updated with detection results
  • Test with gemini/claude CLI if available
  • Test fallback to built-in review when no external agents available

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@bmadcode
Copy link
Collaborator

bmadcode commented Dec 8, 2025

Great idea - a few things!

I am about to switch all of the workflows to not use workflow.xml to the sharded steps. So for any command we want to give to an external agent, it sounds like we will just want a clear separate text file prompt that is self contained also in a single file to give to the agent? This could be used for much more than just the code review.

When installing, the user already has a multi-select for Tools they have installed. This could be just a single question then in bmm if they want to allow supported installed tools do external command executions. Also saying it this way, as this could really be used for much more than code review, pretty much review of anything could potentially produce better results.

What combinations of tools and OS was this tested on?

Random idea - what if we had a folder of named prompts to pass to other agents, maybe call them external workflows. The could all register as installed tool external slash commands. so if the folder had external-code-review.md, for each tool installed it could create a slash command. /bmad-bmm-ext-copilot-code-review. While this could result in a lot for users to remember, all they would need to know is the tool and autocomplete will show them the options. so if in claude, and started to type /copilot - the autocomplete would show all of the possible commands.

@alexeyv
Copy link
Contributor

alexeyv commented Dec 8, 2025

The choice of alt agent should belong with the user.

I love this capability in principle. But I also wonder, how many people would frown upon the idea of BMAD deciding to run some agentic software in YOLO mode on the computer without asking?

What if we defined a location like .bmad/alt-agent.[md|sh]? If it exists, run review through it. If it doesn't, stop and ask. We would then take all this code from this PR, put it into examples of alt agent. And we could also have a "there is no alt agent" version of alt-agent.md For when you don't want to be asked, and you don't want to be delegating it to some other LLM.

@sjennings
Copy link
Contributor Author

it sounds like we will just want a clear separate text file prompt that is self contained also in a single file to give to the agent?

It's already a plaintext prompt, I'll break it out into a text file.

When installing, the user already has a multi-select for Tools they have installed. This could be just a single question then in bmm if they want to allow supported installed tools do external command executions.

I'll see if I can get it integrated to the installer as a choice. I need to update that anyway to work with v15.

The choice of alt agent should belong with the user.

Absolutely. It's defined in a config file right now. Could use a selector in the installer, perhaps.

What if we defined a location like .bmad/alt-agent.[md|sh]?

Right now it's defined in config.yaml, this could be moved to its own file fairly easily.

how many people would frown upon the idea of BMAD deciding to run some agentic software in YOLO mode on the computer without asking?

This is an excellent point and one I agree with. By its nature a code review should be non-destructive - and I'll add to the prompt emphasis that the external agent should not write/change/delete any file. However to run as an external agent in a headless context the YOLO mode is necessary. Code reviews should be the safest application of this, but it also assumes the user is running in a fairly hardened/backed up environment and the LLM is not going to go off the rails and delete everything it sees. Neither of these are safe assumptions! https://bsky.app/profile/brokentoys.social/post/3m7h4ul6dsc2u

this could really be used for much more than code review, pretty much review of anything could potentially produce better results.

The problem here is that external agents have to run in yolo mode with this approach - for anything that has a write application, it would require a dozen warnings and they would not be enough and someone WILL come screaming into the support discord that we destroyed their computer.

What combinations of tools and OS was this tested on?

Just mac os, claude for the primary agent, and codex, gemini and claude for external agents. This is primarily crafted to deal with a frustration I had with invoking repeated code reviews in codex struggling with codex's often-broken slash command support.

I would not be opposed to having this functionality be in an optional module. I don't know if a core workflow change can be done via a modular approach currently.

I'll update shortly with discussed changes and updates to maintain functionality with current version.

Scott Jennings added 2 commits December 8, 2025 10:36
Adds support for delegating adversarial code reviews to external CLI agents
(Codex, Gemini, or Claude) when available. This provides independent, unbiased
code reviews from a different AI model.

Changes:
- Add invoke-bash and set-var tags to workflow.xml execution engine
- Add external_review_agents configuration to install-config.yaml
- Rewrite code-review workflow to detect and invoke external agents
- Cache agent detection in config.yaml to avoid repeated CLI checks
- Add fallback to built-in review if external agents unavailable/fail
- Update checklist to reflect new external agent workflow

External agent invocation:
- Codex: codex exec --full-auto "prompt"
- Gemini: gemini -p "prompt" --yolo
- Claude: claude -p "prompt" --dangerously-skip-permissions
- Add installer prompt for external review agent selection (Codex, Gemini, Claude, None)
- Extract adversarial review prompt to external-agent-prompt.md for easier maintenance
- Simplify detection logic: check for "none" first, then verify CLI availability
- Add read-only sandbox flag to Codex invocation for safety
- Update workflow.yaml to reference new config variable
@sjennings sjennings force-pushed the feat/external-agent-code-review branch from b4c185c to dcba8e5 Compare December 8, 2025 16:57
@sjennings
Copy link
Contributor Author

Changes in most recent commit:

  • Now fully compatible with alpha 15
  • Installer support. User can select from Codex, Gemini, Claude Code or None. None is listed as default.
  • Prompt used is now in a markdown file in the workflow directory.
  • Added language to prompt to try to ensure read-only operations.
  • The Codex external agent invocation now uses a read-only sandbox. (Need to research if this is as easily supported in Gemini and Claude Code and test both.)

Allows users to customize the config location via bmad_folder setting.
@sjennings
Copy link
Contributor Author

sjennings commented Dec 8, 2025

Also a note: I've started using a Codex skill for claude code - Claude will use that automatically instead of the specified invocation (which is fine, if a user installs a skill for this they already have opinonated preferences). Wondering if it would be helpful to embed? (Probably not since it's Claude Code specific)


---
name: codex
description: Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing
---

# Codex Skill Guide

## Running a Task

1. Ask the user (via `AskUserQuestion`) which model to run (`gpt-5.1-codex-max`, `gpt-5.1-codex-mini` or `gpt-5.1`)) AND which reasoning effort to use (`high`, `medium`, or `low`) in a **single prompt with two questions**.
2. Select the sandbox mode required for the task; default to `--sandbox read-only` unless edits or network access are necessary.
3. Assemble the command with the appropriate options:
   - `-m, --model <MODEL>`
   - `--config model_reasoning_effort="<high|medium|low>"`
   - `--sandbox <read-only|workspace-write|danger-full-access>`
   - `--full-auto`
   - `-C, --cd <DIR>`
   - `--skip-git-repo-check`
4. Always use --skip-git-repo-check.
5. When continuing a previous session, use `codex exec --skip-git-repo-check resume --last` via stdin. When resuming don't use any configuration flags unless explicitly requested by the user e.g. if he species the model or the reasoning effort when requesting to resume a session. resume syntax: `echo "your prompt here" | codex exec --skip-git-repo-check resume --last 2>/dev/null`. All flags have to be inserted between exec and resume.
6. **IMPORTANT**: By default, append `2>/dev/null` to all `codex exec` commands to suppress thinking tokens (stderr). Only show stderr if the user explicitly requests to see thinking tokens or if debugging is needed.
7. Run the command, capture stdout/stderr (filtered as appropriate), and summarize the outcome for the user.
8. **After Codex completes**, inform the user: "You can resume this Codex session at any time by saying 'codex resume' or asking me to continue with additional analysis or changes."

### Quick Reference

| Use case | Sandbox mode | Key flags |
| --- | --- | --- |
| Read-only review or analysis | `read-only` | `--sandbox read-only 2>/dev/null` |
| Apply local edits | `workspace-write` | `--sandbox workspace-write --full-auto 2>/dev/null` |
| Permit network or broad access | `danger-full-access` | `--sandbox danger-full-access --full-auto 2>/dev/null` |
| Resume recent session | Inherited from original | `echo "prompt" \| codex exec --skip-git-repo-check resume --last 2>/dev/null` (no flags allowed) |
| Run from another directory | Match task needs | `-C <DIR>` plus other flags `2>/dev/null` |

## Following Up

- After every `codex` command, immediately use `AskUserQuestion` to confirm next steps, collect clarifications, or decide whether to resume with `codex exec resume --last`.
- When resuming, pipe the new prompt via stdin: `echo "new prompt" | codex exec resume --last 2>/dev/null`. The resumed session automatically uses the same model, reasoning effort, and sandbox mode from the original session.
- Restate the chosen model, reasoning effort, and sandbox mode when proposing follow-up actions.

## Error Handling

- Stop and report failures whenever `codex --version` or a `codex exec` command exits non-zero; request direction before retrying.
- Before you use high-impact flags (`--full-auto`, `--sandbox danger-full-access`, `--skip-git-repo-check`) ask the user for permission using AskUserQuestion unless it was already given.
- When output includes warnings or partial results, summarize them and ask how to adjust using `AskUserQuestion`.
`

Add instructions to detect and terminate any stale external agent
review processes when starting a new code review session in Claude Code.
@sjennings
Copy link
Contributor Author

Added an attempt at a fix for a recently introduced claude code annoyance: it will now sometimes decide to launch bash tasks (like these code reviews) as background processes, and then reconnect to them for the next code review, assume that they are a new code review instead of the one previously completed, and happily report that the code review is only reporting old issues. Well of course it is, claude, you read the old review. Sigh. Anyway this should hopefully fix it (it's difficult to test because it only happens occasionally)

@sjennings
Copy link
Contributor Author

I also don't think this behavior will persist long term because it eats your context alive monitoring the results of a shell over... and over... and over... and over...

Change cleanup logic from "check and maybe kill" to "kill unconditionally".
The previous approach allowed agents to read stale output from lingering
external agent processes, causing context pollution and reuse of old data.

New behavior:
- Kill ALL background shells as the FIRST action
- Do NOT read output from stale shells (BashOutput forbidden)
- Do NOT check what processes are doing - just terminate them
- Only proceed to Step 1 after all shells are dead
@sjennings
Copy link
Contributor Author

sigh. another fix enforcing this behaviour because claude code thinks it knows best. "Wait! Let me use this data before I kill it!"

@bmadcode
Copy link
Collaborator

@sjennings thanks for the updates and sorry for delay in further response, I have been traveling and availability has been limited. I like the idea of this being an installer question configuration btw.

I will look at all of the PR updates soon and try this out also - might be another day or two just due to the travel and getting situated.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for delegating code reviews to external AI CLI tools (Codex, Gemini, or Claude) to provide independent reviews from different AI models. The implementation includes agent detection, mandatory delegation when available, and fallback to built-in review capabilities.

Key Changes:

  • Added invoke-bash and set-var execution capabilities to the workflow engine
  • Implemented external agent detection and selection logic with runtime configuration
  • Created external agent invocation system with specific CLI commands for each agent

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
workflow.yaml Adds external_review_agent configuration variable and external-agent-prompt.md file reference
instructions.xml Major rewrite adding external agent detection, CLI invocation logic, and fallback mechanisms with mandatory delegation rules
external-agent-prompt.md New template file defining the prompt to be sent to external agents with placeholders for review context
checklist.md Updates validation checklist to include external agent detection steps and verification points
module.yaml Adds external_review_agent configuration option with single-select menu for agent selection
workflow.xml Documents new invoke-bash and set-var XML tags for workflow execution
Comments suppressed due to low confidence (1)

src/modules/bmm/workflows/4-implementation/code-review/instructions.xml:175

  • Step 2 is described as "Build review attack plan" but when an external agent is used, this step's actions are performed but then ignored since the external agent does its own planning. This creates wasted work. Consider making Step 2 conditional - only execute it when using built-in review ({{use_external_agent}} == false), or move the external agent detection and invocation to happen before Step 2.
  <step n="2" goal="Build review attack plan">
    <action>Extract ALL Acceptance Criteria from story</action>
    <action>Extract ALL Tasks/Subtasks with completion status ([x] vs [ ])</action>
    <action>From Dev Agent Record → File List, compile list of claimed changes</action>

    <action>Create review plan:
      1. **AC Validation**: Verify each AC is actually implemented
      2. **Task Audit**: Verify each [x] task is really done
      3. **Code Quality**: Security, performance, maintainability
      4. **Test Quality**: Real tests vs placeholder bullshit
    </action>
  </step>


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 210 to 218
<invoke-bash cmd="codex exec --sandbox read-only --full-auto &quot;$(cat '{{external_prompt_file}}')&quot;" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'gemini'">
<critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>
<invoke-bash cmd="gemini -p &quot;$(cat '{{external_prompt_file}}')&quot; --yolo" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'claude'">
<critical>CLAUDE: Use claude -p with prompt from file</critical>
<invoke-bash cmd="claude -p &quot;$(cat '{{external_prompt_file}}')&quot; --dangerously-skip-permissions" timeout="300000" />
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout value is set to 300000 milliseconds (5 minutes). For complex codebases with comprehensive reviews, this may be insufficient and could cause premature timeout failures, triggering unnecessary fallback to built-in review. Consider increasing the timeout or making it configurable, especially since the PR mentions that "long prompts are expected and supported".

Copilot uses AI. Check for mistakes.

<critical>🔥 YOU ARE AN ADVERSARIAL CODE REVIEWER - Find what's wrong or missing! 🔥</critical>
<!-- ================================================================ -->
<!-- CLAUDE CODE CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS -->
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heading comment states "CLAUDE CODE CLEANUP" which is misleading since this is a general code review workflow that could be executed by any agent (Codex, Gemini, Claude, or others). The heading should be more generic, such as "SHELL CLEANUP" or "BACKGROUND PROCESS CLEANUP" to accurately reflect that this is an environmental cleanup step, not Claude-specific functionality.

Suggested change
<!-- CLAUDE CODE CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS -->
<!-- SHELL CLEANUP - MANDATORY KILL OF ALL BACKGROUND SHELLS -->

Copilot uses AI. Check for mistakes.

<!-- External agent prompt is loaded from external-agent-prompt.md -->
<set-var name="external_prompt_file" value="{installed_path}/external-agent-prompt.md" />
<action>Load {{external_prompt_file}} content into {{external_prompt}}</action>
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action loads content into a variable but this variable is never actually used. The invoke-bash commands on lines 210, 214, and 218 use $(cat '{{external_prompt_file}}') to read the file directly, bypassing the loaded variable. Either remove this unused action or modify the bash commands to use the pre-loaded variable content.

Suggested change
<action>Load {{external_prompt_file}} content into {{external_prompt}}</action>

Copilot uses AI. Check for mistakes.
Comment on lines +221 to +226
<check if="{{bash_exit_code}} != 0 OR {{bash_stdout}} is empty">
<output>⚠️ External agent CLI failed (exit code: {{bash_exit_code}}), falling back to built-in review</output>
<output>Error: {{bash_stderr}}</output>
<set-var name="use_external_agent" value="false" />
<set-var name="external_agent_failed" value="true" />
</check>
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling only checks for non-zero exit code or empty stdout, but doesn't handle cases where the CLI tool hangs, produces partial output, or outputs to stderr instead of stdout. Consider adding more robust error detection, such as checking stderr for error messages or validating that the output contains the expected format markers.

Copilot uses AI. Check for mistakes.
Comment on lines 210 to 218
<invoke-bash cmd="codex exec --sandbox read-only --full-auto &quot;$(cat '{{external_prompt_file}}')&quot;" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'gemini'">
<critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>
<invoke-bash cmd="gemini -p &quot;$(cat '{{external_prompt_file}}')&quot; --yolo" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'claude'">
<critical>CLAUDE: Use claude -p with prompt from file</critical>
<invoke-bash cmd="claude -p &quot;$(cat '{{external_prompt_file}}')&quot; --dangerously-skip-permissions" timeout="300000" />
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commands use $(cat '{{external_prompt_file}}') which reads the entire file content into a command-line argument. For large prompts (as mentioned in the prohibited rationalizations), this could hit shell command-line length limits (typically 128KB-2MB depending on the system). Consider using file redirection or piping the file directly to the CLI if the tools support stdin, e.g., codex exec --full-auto < "{{external_prompt_file}}" or passing the file path as an argument if the CLIs support it.

Suggested change
<invoke-bash cmd="codex exec --sandbox read-only --full-auto &quot;$(cat '{{external_prompt_file}}')&quot;" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'gemini'">
<critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>
<invoke-bash cmd="gemini -p &quot;$(cat '{{external_prompt_file}}')&quot; --yolo" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'claude'">
<critical>CLAUDE: Use claude -p with prompt from file</critical>
<invoke-bash cmd="claude -p &quot;$(cat '{{external_prompt_file}}')&quot; --dangerously-skip-permissions" timeout="300000" />
<invoke-bash cmd="codex exec --sandbox read-only --full-auto < '{{external_prompt_file}}'" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'gemini'">
<critical>GEMINI: Use gemini -p with prompt from file and --yolo</critical>
<invoke-bash cmd="gemini -p '{{external_prompt_file}}' --yolo" timeout="300000" />
</check>
<check if="{{external_agent_cmd}} == 'claude'">
<critical>CLAUDE: Use claude -p with prompt from file</critical>
<invoke-bash cmd="claude -p '{{external_prompt_file}}' --dangerously-skip-permissions" timeout="300000" />

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +35
STORY CONTEXT: {{story_path}}
FILES TO REVIEW: {{comprehensive_file_list}}
ACCEPTANCE CRITERIA: {{acceptance_criteria_list}}
TASKS: {{task_list}}

OUTPUT FORMAT:

## HIGH SEVERITY

- [file:line] Issue description | Suggested fix

## MEDIUM SEVERITY

- [file:line] Issue description | Suggested fix

## LOW SEVERITY

- [file:line] Issue description | Suggested fix
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external prompt file contains template variables ({{story_path}}, {{comprehensive_file_list}}, {{acceptance_criteria_list}}, {{task_list}}) that need to be populated before being passed to the external agent. However, there's no action to interpolate these variables before the file is read by the cat command. The external agent will receive literal "{{variable}}" strings instead of actual values, making the prompt incomplete and ineffective.

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +26
<critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical>
<critical>Before ANY other workflow step, you MUST terminate all background processes.</critical>
<critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical>
<critical>DO NOT try to "check what they're doing" - just kill them.</critical>

<check if="BMAD is running as Claude Code (KillShell tool is available)">
<action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action>
<action>Do NOT read output from these shells first - kill them blind</action>
<action>Do NOT rationalize keeping them alive - they contain stale data</action>
<action>If multiple shells exist, kill ALL of them one by one</action>
<output>🔪 Killed all background shells to prevent stale data contamination</output>
</check>
<critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical>
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 14-26 describe killing background shells which is specific to Claude Code's KillShell tool. However, this code review workflow could be executed by any AI agent (including Codex or Gemini as mentioned in the PR). These instructions and the conditional check for "Claude Code" will be confusing or irrelevant when executed by other agents. Consider either removing this section or making it more generic/conditional based on the executing agent.

Suggested change
<critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical>
<critical>Before ANY other workflow step, you MUST terminate all background processes.</critical>
<critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical>
<critical>DO NOT try to "check what they're doing" - just kill them.</critical>
<check if="BMAD is running as Claude Code (KillShell tool is available)">
<action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action>
<action>Do NOT read output from these shells first - kill them blind</action>
<action>Do NOT rationalize keeping them alive - they contain stale data</action>
<action>If multiple shells exist, kill ALL of them one by one</action>
<output>🔪 Killed all background shells to prevent stale data contamination</output>
</check>
<critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical>
<check if="BMAD is running as Claude Code (KillShell tool is available)">
<critical>🚨 FIRST ACTION REQUIRED: Kill all background shells 🚨</critical>
<critical>Before ANY other workflow step, you MUST terminate all background processes.</critical>
<critical>DO NOT use BashOutput to read from stale shells - this causes context pollution.</critical>
<critical>DO NOT try to "check what they're doing" - just kill them.</critical>
<action>IMMEDIATELY use KillShell to terminate ALL background shells - no exceptions</action>
<action>Do NOT read output from these shells first - kill them blind</action>
<action>Do NOT rationalize keeping them alive - they contain stale data</action>
<action>If multiple shells exist, kill ALL of them one by one</action>
<output>🔪 Killed all background shells to prevent stale data contamination</output>
<critical>Only proceed to Step 1 AFTER all background shells are terminated.</critical>
</check>

Copilot uses AI. Check for mistakes.
@alexeyv
Copy link
Contributor

alexeyv commented Dec 12, 2025

@sjennings, @bmadcode here is a simple thought:

What do we call a piece knowledge, user- or project-specific, about executing code reviews in ways that a user knows and approves of? Hint: 7 letters, starts with the C, ends with 'ontext'.

What if this wants to be a workflow a la 'setup-reviews', that guides the user through creation of docs/how-to-review.yaml? That how-to-review may say things like "Use Bash to call CodeRabbit", "Use the Task tool to conduct adversarial review", "Use Bash to call codex and pass it this prompt:..." We really don't care what it says. We just suggest to the user what it might say and ask questions. As long as the output is shaped as a list(-s) of review findings

code-review then becomes an orchestrator:

  • assemble a diff of changes under review
  • If how-to-review exists. read and follow it.
  • If it doesn't, either
    • abort and tell the user to run setup-reviews, or
    • Fall back to what code-review is currently doing, plus maybe the adversarial bit2qw
  • Put all findings together, deduplicate them, assess their severity and what's real and what's not, ordered by severity, and present to the user. Back to what it already knows how to do: dealing with the findings.

@alexeyv
Copy link
Contributor

alexeyv commented Dec 12, 2025

@sjennings I love what you've done with the place. Don't mind me stealing half of this diff to use for work? :)

@sjennings
Copy link
Contributor Author

Feel free! It was your suggestion to dive in and then submit a PR that prompted my sharing it.

As for the changes in the Copilot review I'll look into them more closely later.

@alexeyv
Copy link
Contributor

alexeyv commented Dec 12, 2025

Here is my version of a small step in the same general direction:
#1114

@alexeyv
Copy link
Contributor

alexeyv commented Dec 12, 2025

@sjennings
If we end up going with how-to-review.yaml, better part of what you have here becomes the things setup-review suggests.
/creative raving mode

Scott Jennings added 2 commits December 13, 2025 07:17
- Rename misleading "CLAUDE CODE CLEANUP" comment to generic "SHELL CLEANUP"
- Remove unused external_prompt variable loading action
- Increase external agent timeout from 5 to 10 minutes for complex reviews
- Fix Review Method output to show actual method used instead of "OR"
- Restore web_bundle: false setting needed by web bundler
External agents find ALL possible issues, but not all are relevant.
This new step 4 evaluates findings against project context:
- Actually Important: AC violations, bugs, real security issues
- Context-Dependent: Performance, validation strictness (asks user)
- Theoretical/Nitpicking: Micro-optimizations, style preferences (skips)

This enables pragmatic code review that focuses on what matters.
@sjennings
Copy link
Contributor Author

Closing for now since @alexeyv is working on an alternate implementation

@sjennings sjennings closed this Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants