Skip to content

Commit ec8de9e

Browse files
igerberclaude
andcommitted
Add Codex CLI backend to /ai-review-local
The local skill now has TWO backends, dispatched by `--backend {auto,codex,api}`: - `codex`: invokes `codex exec` CLI agentically (grep, multi-turn, file loading), matching the CI Codex reviewer's quality. Requires `codex` installed (`brew install --cask codex` / `npm install -g @openai/codex`) and `codex login` completed. - `api` (existing): single-shot OpenAI Responses API. Faster, cheaper, but won't surface cross-surface or pattern-wide findings. Default `--backend auto` picks codex when `codex` is on PATH AND `~/.codex/auth.json` exists; falls back to api otherwise. CI parity for the codex path (pinned to match the CI Codex action): - model: gpt-5.4 - effort: model_reasoning_effort=xhigh via `-c` (NOT `reasoning_effort` — Codex silently ignores unknown `-c` keys; `reasoning_effort` produces "reasoning effort: none" while `model_reasoning_effort` produces "reasoning effort: xhigh", verified against codex 0.130.0) - sandbox: read-only (permits shell exec — `rg`, `grep`, `git diff` — but blocks filesystem writes and most network) - prompt via stdin (avoids ARG_MAX edge cases for compiled prompts that can hit hundreds of KB); mirrors CI's `prompt-file:` flow Skip irrelevant flow under codex backend (warn-and-ignore): - `--context standard/deep` source-file preloading - `--token-budget` pruning - `--include-files` staging - `--repo-root` requirement (codex falls back to cwd) The script warns when these flags are passed under `--backend codex`. Error handling for codex path: - Non-zero exit → RuntimeError with codex's stderr bubbled verbatim - Empty `-o` output file → RuntimeError - KeyboardInterrupt → cleanup partial output, propagate - stderr streamed to user during run (Codex events: file reads, tool invocations) so 3-15 min runs aren't silent Tests added (25 new): - TestBackendDetection (6): auto/explicit, codex-missing error - TestBuildCodexCmd (7): pin literal `model_reasoning_effort=xhigh` token, sandbox=read-only, --cd, -o, no positional prompt - TestCallCodex (6): subprocess argv, stdin pass-through, output read, nonzero-exit + empty-output error paths - TestCodexBackendDocConsistency (2): skill doc must enumerate the backend choices and the codex install command 196 tests pass. Lint state matches origin/main (4 pre-existing warnings, none introduced). Validated end-to-end with a live `codex exec` smoke test against this PR's own diff. Codex correctly produced a structured review (Overall Assessment / Executive Summary / P0/P1/P2 severity bands) and surfaced two real P1 regressions in the initial impl (--repo-root validation running before backend resolution; --include-files not gated on backend == api). Both are addressed in this commit; the live-codex smoke test itself is the strongest validation that the backend works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ca63fb2 commit ec8de9e

3 files changed

Lines changed: 570 additions & 65 deletions

File tree

.claude/commands/ai-review-local.md

Lines changed: 88 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,74 @@
11
---
2-
description: Run AI code review locally using OpenAI API before opening a PR
3-
argument-hint: "[--context minimal|standard|deep] [--include-files <files>] [--token-budget <n>] [--force-fresh] [--full-registry] [--model <model>] [--timeout <seconds>] [--dry-run]"
2+
description: Run AI code review locally using Codex CLI or OpenAI API before opening a PR
3+
argument-hint: "[--backend auto|codex|api] [--context minimal|standard|deep] [--include-files <files>] [--token-budget <n>] [--force-fresh] [--full-registry] [--model <model>] [--timeout <seconds>] [--dry-run]"
44
---
55

66
# Local AI Code Review
77

8-
Run a structured code review using the OpenAI Responses API. Reviews changes
8+
Run a structured code review using either the **Codex CLI** (agentic, matches CI
9+
quality) or the **OpenAI Responses API** (single-shot, faster). Reviews changes
910
against the same methodology criteria used by the CI reviewer, but adapted for local
1011
pre-PR use. Designed for iterative review/revision cycles before submitting a PR.
1112

13+
## Backend selection
14+
15+
Two backends are supported:
16+
17+
| Backend | Latency | Cost | Quality |
18+
|---|---|---|---|
19+
| `api` (`gpt-5.4`) | 30-60s | $0.05-0.50/run, metered via `OPENAI_API_KEY` | Single-shot — won't grep, can't load files on its own initiative |
20+
| `codex` (any auth) | 3-15 min | depends on your `codex login` mode (subscription vs API key) — see codex docs | Agentic — matches CI Codex reviewer, can grep / load files / multi-turn |
21+
22+
Choose with `--backend {auto,codex,api}` (default `auto`):
23+
24+
- **`auto`**: pick `codex` if the `codex` CLI is installed AND `~/.codex/auth.json`
25+
exists (i.e. `codex login` has been completed); otherwise fall back to `api`.
26+
- **`codex`**: requires `codex` CLI installed (`brew install --cask codex` or
27+
`npm install -g @openai/codex`) and `codex login` completed.
28+
- **`api`**: requires `OPENAI_API_KEY` env var. Fast iteration mode.
29+
30+
Notes:
31+
- `codex` uses `--sandbox read-only`, which permits shell command execution
32+
(`rg`, `grep`, `git diff`) inside Codex's agentic loop — the "read-only" name
33+
refers to filesystem writes and network access, not shell exec. This is what
34+
enables the agentic audits.
35+
- Long Codex runs (3-15 min) can be cancelled with CTRL-C; the partial output is
36+
cleaned up automatically.
37+
- `--context` and `--token-budget` are ignored under the codex backend (Codex
38+
chooses what to load on its own); the script warns if you pass them.
39+
1240
## Arguments
1341

1442
`$ARGUMENTS` may contain optional flags:
15-
- `--context {minimal,standard,deep}`: Context depth (default: `standard`)
43+
- `--backend {auto,codex,api}`: Reviewer backend (default: `auto`). See above.
44+
- `--context {minimal,standard,deep}`: Context depth (default: `standard`).
45+
*Api backend only.*
1646
- `minimal`: Diff only (original behavior)
1747
- `standard`: Diff + full contents of changed `diff_diff/` source files
1848
- `deep`: Standard + import-graph expansion (files imported by changed files)
1949
- `--include-files <file1,file2,...>`: Extra files to include as read-only context
20-
(filenames resolve under `diff_diff/`, or use paths relative to repo root)
50+
(filenames resolve under `diff_diff/`, or use paths relative to repo root).
51+
*Api backend only.*
2152
- `--token-budget <n>`: Max estimated input tokens before dropping import-context
2253
files (default: 200000). Changed source files are always included regardless of budget.
54+
*Api backend only.*
2355
- `--force-fresh`: Skip delta-diff mode, run a full fresh review even if previous state exists
2456
- `--full-registry`: Include the entire REGISTRY.md instead of selective sections
25-
- `--model <name>`: Override the OpenAI model (default: `gpt-5.4`)
26-
- `--timeout <seconds>`: HTTP request timeout. If omitted, defaults to 900 for reasoning models (gpt-5.4, *-pro, o1/o3/o4) and 300 otherwise.
57+
- `--model <name>`: Override the model (default: `gpt-5.4`). Applies to both backends.
58+
- `--timeout <seconds>`: HTTP request timeout. If omitted, defaults to 900 for reasoning models (gpt-5.4, *-pro, o1/o3/o4) and 300 otherwise. *Api backend only.*
2759
- `--dry-run`: Print the compiled prompt without calling the API
2860

29-
**Reasoning models** (`gpt-5.4-pro`, `o3`, `o4-mini`, etc.): Reviews may take 10-15
30-
minutes. For deep reviews with reasoning models, combine `--token-budget` with `--model`:
61+
**Reasoning models** (`gpt-5.4-pro`, `o3`, `o4-mini`, etc.) on the api backend:
62+
Reviews may take 10-15 minutes. For deep reviews with reasoning models, combine
63+
`--token-budget` with `--model`:
64+
```
65+
/ai-review-local --backend api --model gpt-5.4-pro --token-budget 500000 --context deep
66+
```
67+
68+
**Codex backend** for CI-quality review:
3169
```
32-
/ai-review-local --model gpt-5.4-pro --token-budget 500000 --context deep
70+
/ai-review-local --backend codex
71+
# or just `/ai-review-local` if codex is installed + logged in (auto-detects)
3372
```
3473

3574
## Constraints
@@ -55,21 +94,36 @@ requires no arguments.
5594
Run these checks in parallel:
5695

5796
```bash
58-
# Check API key is set (never echo/log the actual value)
97+
# Check api-backend key is set (only required if backend resolves to api)
5998
test -n "$OPENAI_API_KEY" && echo "API key: set" || echo "API key: MISSING"
6099

100+
# Check codex backend availability (auto-detect)
101+
which codex >/dev/null 2>&1 && test -f ~/.codex/auth.json \
102+
&& echo "Codex: installed + logged in" \
103+
|| echo "Codex: not available (install + run 'codex login' to enable)"
104+
61105
# Check script exists
62106
test -f .claude/scripts/openai_review.py && echo "Script: found" || echo "Script: MISSING"
63107
```
64108

65-
If the API key is missing (and not `--dry-run`):
109+
The script resolves the backend itself (`auto` picks codex if available, else
110+
api). The OpenAI API key is only required when the resolved backend is `api`.
111+
112+
If the resolved backend will be `api` (no codex available, or `--backend api`)
113+
and the key is missing (and not `--dry-run`):
66114
```
67-
Error: OPENAI_API_KEY is not set.
115+
Error: OPENAI_API_KEY is not set (required for api backend).
116+
117+
Options:
118+
1. Install + log in to codex (matches CI quality):
119+
brew install --cask codex
120+
codex login
121+
(then run /ai-review-local — auto-detect picks codex)
68122
69-
To set it up:
70-
1. Get a key from https://platform.openai.com/api-keys
71-
2. Add to your shell: echo 'export OPENAI_API_KEY=sk-...' >> ~/.zshrc
72-
3. Reload: source ~/.zshrc
123+
2. Set up an API key:
124+
Get a key from https://platform.openai.com/api-keys
125+
echo 'export OPENAI_API_KEY=sk-...' >> ~/.zshrc
126+
source ~/.zshrc
73127
```
74128

75129
If the script is missing:
@@ -445,32 +499,35 @@ runs `--force-fresh` or when a rebase invalidates the tracked commit.
445499
## Examples
446500

447501
```bash
448-
# Standard review of current branch vs main (default: full source file context)
502+
# Auto-detect backend (codex if installed + logged in, else api). Default flow.
449503
/ai-review-local
450504

451-
# Review with minimal context (diff only, original behavior)
452-
/ai-review-local --context minimal
505+
# Force the agentic codex backend (matches CI quality)
506+
/ai-review-local --backend codex
507+
508+
# Force the fast api backend (single-shot, $0.05-0.50/run)
509+
/ai-review-local --backend api
453510

454-
# Review with deep context (changed files + imported files)
455-
/ai-review-local --context deep
511+
# Api backend, minimal context (diff only)
512+
/ai-review-local --backend api --context minimal
456513

457-
# Include specific files as extra context
458-
/ai-review-local --include-files linalg.py,utils.py
514+
# Api backend, deep context (changed files + imported files)
515+
/ai-review-local --backend api --context deep
516+
517+
# Api backend, extra context files
518+
/ai-review-local --backend api --include-files linalg.py,utils.py
459519

460520
# Preview the compiled prompt without calling the API
461521
/ai-review-local --dry-run
462522

463523
# Force a fresh review (ignore previous review state)
464524
/ai-review-local --force-fresh
465525

466-
# Use a different model with full registry
467-
/ai-review-local --model gpt-4.1 --full-registry
468-
469-
# Deep review with reasoning model (may take 10-15 minutes)
470-
/ai-review-local --model gpt-5.4-pro --token-budget 500000 --context deep
526+
# Different model with full registry
527+
/ai-review-local --backend api --model gpt-4.1 --full-registry
471528

472-
# Limit token budget for faster/cheaper reviews
473-
/ai-review-local --token-budget 100000
529+
# Deep api review with reasoning model (10-15 min)
530+
/ai-review-local --backend api --model gpt-5.4-pro --token-budget 500000 --context deep
474531
```
475532

476533
## Notes

0 commit comments

Comments
 (0)