Skip to content

Commit 78b1ef0

Browse files
authored
Merge pull request #421 from igerber/codex-backend
Add Codex CLI backend to /ai-review-local
2 parents 73c2391 + 0a9e059 commit 78b1ef0

3 files changed

Lines changed: 1004 additions & 98 deletions

File tree

.claude/commands/ai-review-local.md

Lines changed: 149 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,87 @@
11
---
2-
description: Run AI code review locally using OpenAI API before opening a PR
3-
argument-hint: "[--context minimal|standard|deep] [--include-files <files>] [--token-budget <n>] [--force-fresh] [--full-registry] [--model <model>] [--timeout <seconds>] [--dry-run]"
2+
description: Run AI code review locally using Codex CLI or OpenAI API before opening a PR
3+
argument-hint: "[--backend auto|codex|api] [--context minimal|standard|deep] [--include-files <files>] [--token-budget <n>] [--force-fresh] [--full-registry] [--model <model>] [--timeout <seconds>] [--dry-run]"
44
---
55

66
# Local AI Code Review
77

8-
Run a structured code review using the OpenAI Responses API. Reviews changes
8+
Run a structured code review using either the **Codex CLI** (agentic, matches CI
9+
quality) or the **OpenAI Responses API** (single-shot, faster). Reviews changes
910
against the same methodology criteria used by the CI reviewer, but adapted for local
1011
pre-PR use. Designed for iterative review/revision cycles before submitting a PR.
1112

13+
## Backend selection
14+
15+
Two backends are supported:
16+
17+
| Backend | Latency | Cost | Quality |
18+
|---|---|---|---|
19+
| `api` (`gpt-5.4`) | 30-60s | $0.05-0.50/run, metered via `OPENAI_API_KEY` | Single-shot — won't grep, can't load files on its own initiative |
20+
| `codex` (any auth) | 3-15 min | depends on your `codex login` mode (subscription vs API key) — see codex docs | Agentic — matches CI Codex reviewer, can grep / load files / multi-turn |
21+
22+
Choose with `--backend {auto,codex,api}` (default `auto`):
23+
24+
- **`auto`**: pick `codex` if the `codex` CLI is installed AND `~/.codex/auth.json`
25+
exists (i.e. `codex login` has been completed); otherwise fall back to `api`.
26+
- **`codex`**: requires `codex` CLI installed (`brew install --cask codex` or
27+
`npm install -g @openai/codex`) and `codex login` completed.
28+
- **`api`**: requires `OPENAI_API_KEY` env var. Fast iteration mode.
29+
30+
Notes:
31+
- `codex` uses `--sandbox read-only`, which permits shell command execution
32+
(`rg`, `grep`, `git diff`) inside Codex's agentic loop — the "read-only" name
33+
refers to filesystem writes and network access, not shell exec. This is what
34+
enables the agentic audits.
35+
- Long Codex runs (3-15 min) can be cancelled with CTRL-C; the partial output is
36+
cleaned up automatically.
37+
- `--context` and `--token-budget` are ignored under the codex backend (Codex
38+
chooses what to load on its own); the script warns if you pass them.
39+
- **Surface area (informational, non-blocking)**: under the codex backend,
40+
Codex reads any file under the repo root via `--cd`. This is intrinsic to
41+
using `codex` as an agentic reviewer — the same surface anyone running
42+
`codex` directly already accepts via `codex login`. Before invoking codex
43+
the script does a quick recursive filename scan for obvious secret-bearing
44+
patterns (`.env`, `.env.local`, `id_rsa`, `*.pem`, `*.key`,
45+
`secrets.{yml,yaml,json}`, `.netrc`, `.npmrc`, `.pypirc`, etc.; safe
46+
template variants like `.env.example` excluded; case-insensitive). If any
47+
matches exist, a stderr notice lists them and codex still runs. CTRL-C and
48+
switch to `--backend api` (or sanitize the worktree) if you don't want
49+
Codex to see those files. This is a notice, not a gate — real secret
50+
prevention belongs at gitignore + code review, not at codex invocation.
51+
1252
## Arguments
1353

1454
`$ARGUMENTS` may contain optional flags:
15-
- `--context {minimal,standard,deep}`: Context depth (default: `standard`)
55+
- `--backend {auto,codex,api}`: Reviewer backend (default: `auto`). See above.
56+
- `--context {minimal,standard,deep}`: Context depth (default: `standard`).
57+
*Api backend only.*
1658
- `minimal`: Diff only (original behavior)
1759
- `standard`: Diff + full contents of changed `diff_diff/` source files
1860
- `deep`: Standard + import-graph expansion (files imported by changed files)
1961
- `--include-files <file1,file2,...>`: Extra files to include as read-only context
20-
(filenames resolve under `diff_diff/`, or use paths relative to repo root)
62+
(filenames resolve under `diff_diff/`, or use paths relative to repo root).
63+
*Api backend only.*
2164
- `--token-budget <n>`: Max estimated input tokens before dropping import-context
2265
files (default: 200000). Changed source files are always included regardless of budget.
66+
*Api backend only.*
2367
- `--force-fresh`: Skip delta-diff mode, run a full fresh review even if previous state exists
2468
- `--full-registry`: Include the entire REGISTRY.md instead of selective sections
25-
- `--model <name>`: Override the OpenAI model (default: `gpt-5.4`)
26-
- `--timeout <seconds>`: HTTP request timeout. If omitted, defaults to 900 for reasoning models (gpt-5.4, *-pro, o1/o3/o4) and 300 otherwise.
27-
- `--dry-run`: Print the compiled prompt without calling the API
69+
- `--model <name>`: Override the model (default: `gpt-5.4`). Applies to both backends.
70+
- `--timeout <seconds>`: HTTP request timeout. If omitted, defaults to 900 for reasoning models (gpt-5.4, *-pro, o1/o3/o4) and 300 otherwise. *Api backend only.*
71+
- `--dry-run`: Print the compiled prompt without invoking the chosen backend
72+
(no API call, no codex subprocess)
73+
74+
**Reasoning models** (`gpt-5.4-pro`, `o3`, `o4-mini`, etc.) on the api backend:
75+
Reviews may take 10-15 minutes. For deep reviews with reasoning models, combine
76+
`--token-budget` with `--model`:
77+
```
78+
/ai-review-local --backend api --model gpt-5.4-pro --token-budget 500000 --context deep
79+
```
2880

29-
**Reasoning models** (`gpt-5.4-pro`, `o3`, `o4-mini`, etc.): Reviews may take 10-15
30-
minutes. For deep reviews with reasoning models, combine `--token-budget` with `--model`:
81+
**Codex backend** for CI-quality review:
3182
```
32-
/ai-review-local --model gpt-5.4-pro --token-budget 500000 --context deep
83+
/ai-review-local --backend codex
84+
# or just `/ai-review-local` if codex is installed + logged in (auto-detects)
3385
```
3486

3587
## Constraints
@@ -39,37 +91,60 @@ This skill does not modify source code files. It may:
3991
- Create/update review artifacts in `.claude/reviews/` (gitignored)
4092
- Write temporary files to `/tmp/` (cleaned up in Step 8)
4193

42-
Step 5 makes a single external API call to OpenAI. Step 3b runs a secret scan
43-
before any data is sent externally.
94+
Step 5 invokes the chosen backend:
95+
- **api backend**: single external HTTP call to OpenAI Responses API. Step 3b/3c
96+
run the canonical pre-upload secret scan before any data is sent.
97+
- **codex backend**: spawns `codex exec` as a subprocess, which talks to
98+
OpenAI iteratively under a read-only sandbox. The script prints a stderr
99+
notice if obvious sensitive-filename patterns are present in the repo
100+
(informational; codex still runs). The api-backend's Step 3b/3c scans
101+
don't apply — Codex's read surface is the whole repo, intrinsic to using
102+
it as an agentic reviewer.
44103

45104
## Instructions
46105

47106
### Step 1: Parse Arguments
48107

49108
Parse `$ARGUMENTS` for the optional flags listed above. All flags are optional —
50-
the default behavior (standard context, selective registry, gpt-5.4, live API call)
109+
the default behavior (auto-detect backend, standard context for api or
110+
agentic loading for codex, selective registry, gpt-5.4)
51111
requires no arguments.
52112

53113
### Step 2: Validate Prerequisites
54114

55115
Run these checks in parallel:
56116

57117
```bash
58-
# Check API key is set (never echo/log the actual value)
118+
# Check api-backend key is set (only required if backend resolves to api)
59119
test -n "$OPENAI_API_KEY" && echo "API key: set" || echo "API key: MISSING"
60120

121+
# Check codex backend availability (auto-detect)
122+
which codex >/dev/null 2>&1 && test -f ~/.codex/auth.json \
123+
&& echo "Codex: installed + logged in" \
124+
|| echo "Codex: not available (install + run 'codex login' to enable)"
125+
61126
# Check script exists
62127
test -f .claude/scripts/openai_review.py && echo "Script: found" || echo "Script: MISSING"
63128
```
64129

65-
If the API key is missing (and not `--dry-run`):
130+
The script resolves the backend itself (`auto` picks codex if available, else
131+
api). The OpenAI API key is only required when the resolved backend is `api`.
132+
133+
If the resolved backend will be `api` (no codex available, or `--backend api`)
134+
and the key is missing (and not `--dry-run`):
66135
```
67-
Error: OPENAI_API_KEY is not set.
136+
Error: OPENAI_API_KEY is not set (required for api backend).
68137
69-
To set it up:
70-
1. Get a key from https://platform.openai.com/api-keys
71-
2. Add to your shell: echo 'export OPENAI_API_KEY=sk-...' >> ~/.zshrc
72-
3. Reload: source ~/.zshrc
138+
Options:
139+
1. Install + log in to codex (matches CI quality):
140+
brew install --cask codex
141+
codex login
142+
(then run /ai-review-local — auto-detect picks codex)
143+
144+
2. Set up an API key:
145+
Get a key from https://platform.openai.com/api-keys
146+
echo 'export OPENAI_API_KEY=sk-...' >> ~/.zshrc
147+
source ~/.zshrc
73148
```
74149

75150
If the script is missing:
@@ -302,10 +377,11 @@ review via `--previous-review`.
302377
### Step 5: Run the Review Script
303378

304379
Build and run the command. Include optional arguments only when their conditions are met:
380+
- `--backend`: pass through from parsed arguments (default `auto`); the script auto-detects
305381
- `--previous-review`: only if `.claude/reviews/local-review-previous.md` exists AND `--force-fresh` was NOT set
306382
- `--delta-diff` and `--delta-changed-files`: only if delta files were generated in Step 4
307383
- `--review-state`, `--commit-sha`, `--base-ref`: always include (even with `--force-fresh`, to seed a new baseline)
308-
- `--context`, `--include-files`, `--token-budget`: pass through from parsed arguments
384+
- `--context`, `--include-files`, `--token-budget`: pass through from parsed arguments (only meaningful for `--backend api`; ignored under codex)
309385

310386
```bash
311387
python3 .claude/scripts/openai_review.py \
@@ -316,6 +392,7 @@ python3 .claude/scripts/openai_review.py \
316392
--output .claude/reviews/local-review-latest.md \
317393
--branch-info "$branch_name" \
318394
--repo-root "$(pwd)" \
395+
--backend "$backend" \
319396
--context "$context_level" \
320397
--review-state .claude/reviews/review-state.json \
321398
--commit-sha "$(git rev-parse HEAD)" \
@@ -331,6 +408,12 @@ python3 .claude/scripts/openai_review.py \
331408
[--dry-run]
332409
```
333410

411+
Always pass `--backend "$backend"` (where `$backend` is the parsed value, defaulting
412+
to `auto`). The script handles auto-detection internally; forwarding the flag means
413+
explicit `/ai-review-local --backend codex` and `/ai-review-local --backend api`
414+
choices are honored end-to-end. Without forwarding, the user's `--backend` selection
415+
would be silently ignored.
416+
334417
Note: `--force-fresh` is a skill-only flag — it controls whether delta diffs are
335418
generated in Step 4 and is NOT passed to the script.
336419

@@ -342,7 +425,8 @@ generated in Step 4 and is NOT passed to the script.
342425
- After the background command completes, continue to Step 6
343426

344427
If `--dry-run`: display the prompt output and stop. Report the estimated token count,
345-
cost estimate, and model that would be used.
428+
backend, and model that would be used. Cost estimate is shown only for the api
429+
backend (codex doesn't expose token counts up front).
346430

347431
If the script exits non-zero, display the error output and stop.
348432

@@ -445,64 +529,78 @@ runs `--force-fresh` or when a rebase invalidates the tracked commit.
445529
## Examples
446530

447531
```bash
448-
# Standard review of current branch vs main (default: full source file context)
532+
# Auto-detect backend (codex if installed + logged in, else api). Default flow.
449533
/ai-review-local
450534

451-
# Review with minimal context (diff only, original behavior)
452-
/ai-review-local --context minimal
535+
# Force the agentic codex backend (matches CI quality)
536+
/ai-review-local --backend codex
453537

454-
# Review with deep context (changed files + imported files)
455-
/ai-review-local --context deep
538+
# Force the fast api backend (single-shot, $0.05-0.50/run)
539+
/ai-review-local --backend api
456540

457-
# Include specific files as extra context
458-
/ai-review-local --include-files linalg.py,utils.py
541+
# Api backend, minimal context (diff only)
542+
/ai-review-local --backend api --context minimal
459543

460-
# Preview the compiled prompt without calling the API
544+
# Api backend, deep context (changed files + imported files)
545+
/ai-review-local --backend api --context deep
546+
547+
# Api backend, extra context files
548+
/ai-review-local --backend api --include-files linalg.py,utils.py
549+
550+
# Preview the compiled prompt without invoking the chosen backend
461551
/ai-review-local --dry-run
462552

463553
# Force a fresh review (ignore previous review state)
464554
/ai-review-local --force-fresh
465555

466-
# Use a different model with full registry
467-
/ai-review-local --model gpt-4.1 --full-registry
468-
469-
# Deep review with reasoning model (may take 10-15 minutes)
470-
/ai-review-local --model gpt-5.4-pro --token-budget 500000 --context deep
556+
# Different model with full registry
557+
/ai-review-local --backend api --model gpt-4.1 --full-registry
471558

472-
# Limit token budget for faster/cheaper reviews
473-
/ai-review-local --token-budget 100000
559+
# Deep api review with reasoning model (10-15 min)
560+
/ai-review-local --backend api --model gpt-5.4-pro --token-budget 500000 --context deep
474561
```
475562

476563
## Notes
477564

478565
- This skill does NOT modify source files — it only generates temp files and
479566
review artifacts in `.claude/reviews/` (which is gitignored). It may also
480567
create a commit if there are uncommitted changes (Step 3).
481-
- **Context levels**: By default (`standard`), the full contents of changed
482-
`diff_diff/` source files are sent alongside the diff. This catches "sins of
483-
omission" — code that should have changed but wasn't (e.g., a wrapper missing
484-
a new parameter). Use `--context deep` to also include files imported by
485-
changed files as read-only reference.
568+
- **Context levels** (api backend): By default (`standard`), the full contents
569+
of changed `diff_diff/` source files are sent alongside the diff. This catches
570+
"sins of omission" — code that should have changed but wasn't (e.g., a wrapper
571+
missing a new parameter). Use `--context deep` to also include files imported
572+
by changed files as read-only reference. Codex backend ignores `--context`
573+
(it loads files agentically as needed).
486574
- **Delta-diff re-review**: When `review-state.json` exists from a previous run,
487575
the script automatically generates a delta diff (changes since the last reviewed
488576
commit) and focuses the reviewer on those changes. The full branch diff is
489-
included as reference context. Use `--force-fresh` to bypass this.
577+
included as reference context. Use `--force-fresh` to bypass this. Applies to
578+
both backends.
490579
- **Finding tracking**: The script writes structured findings to `review-state.json`
491580
after each review. On re-review, previous findings are shown in a table with
492581
their status (open/addressed), enabling the reviewer to focus on what changed.
493-
- **Cost visibility**: The script shows estimated cost before the API call and
494-
actual cost (from the API response) after completion.
582+
- **Cost visibility** (api backend): The script shows estimated cost before the
583+
API call and actual cost (from the API response) after completion. Codex
584+
backend doesn't expose token counts; cost depends on your `codex login` mode
585+
(subscription unmetered within plan, API key metered).
495586
- Re-review mode activates automatically when a previous review exists in
496587
`.claude/reviews/local-review-latest.md`
497588
- The review criteria are adapted from `.github/codex/prompts/pr_review.md` (same
498589
methodology axes, severity levels, and anti-patterns) but framed for local
499590
code-change review rather than PR review
500591
- The CI review (Codex action with full repo access) remains the authoritative final
501592
check — local review is a fast first pass to catch most issues early
502-
- **Data transmission**: In non-dry-run mode, this skill transmits the unified diff,
503-
changed-file metadata, full source file contents (in standard/deep mode),
504-
import-context files (in deep mode), selected methodology registry text, and
505-
prior review context (if present) to OpenAI via the Responses API.
506-
Use `--dry-run` to preview exactly what would be sent.
593+
- **Data transmission**: In non-dry-run mode:
594+
- **api backend**: this skill transmits the unified diff, changed-file
595+
metadata, full source file contents (in standard/deep mode), import-context
596+
files (in deep mode), selected methodology registry text, and prior review
597+
context (if present) to OpenAI via the Responses API.
598+
- **codex backend**: the compiled prompt (criteria + diff + previous review)
599+
is piped to `codex exec`'s stdin, and Codex itself reads additional repo
600+
files agentically (read-only sandbox) and talks to OpenAI iteratively. A
601+
one-off stderr notice surfaces obvious sensitive-filename matches before
602+
invoking codex (see "Surface area" above) — informational only.
603+
604+
Use `--dry-run` to preview the compiled prompt without invoking either backend.
507605
- This skill pairs naturally with the iterative workflow:
508606
`/ai-review-local` -> address findings -> `/ai-review-local` -> `/submit-pr`

0 commit comments

Comments
 (0)