Skip to content

Commit 3cb6b2a

Browse files
igerberclaude
andcommitted
Revert AI review CI to Codex + gpt-5.4 (reverts #404, #415)
The CI AI reviewer's quality has notably degraded since #404 and #415 landed. This restores 5 files to the snapshot at fe80295 (parent of #404's merge, i.e. the last commit on main before either PR landed): .github/workflows/ai_pr_review.yml -- reinstates openai/codex-action@v1 with model: gpt-5.4 and effort: xhigh .github/codex/prompts/pr_review.md -- removes Single-Pass Completeness Mandate (#404) and Audit #6 "Claim-vs-shipped" + tightened verdict bar (#415); restores the original 179-line prompt .claude/scripts/openai_review.py -- drops --ci-mode (#415) and gpt-5.5 PRICING/reasoning entries (#404); DEFAULT_MODEL back to gpt-5.4 .claude/commands/ai-review-local.md -- restores skill doc to match the restored script tests/test_openai_review.py -- restores test suite to match (152 tests pass at the snapshot) #404's body documented the rollback criterion ("if >2 of next 5 PRs surface new P1+ findings on unchanged code in round 2, revert the model bump"). This invokes that rollback and extends it to also revert the Codex -> Python single-shot backend switch from #415. Out of scope: open branch ci-workflow-ipynb-markdown-extraction (PR #414, unmerged) also touches openai_review.py and pr_review.md; that branch will need to rebase against the older base after this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2f77f55 commit 3cb6b2a

5 files changed

Lines changed: 125 additions & 1064 deletions

File tree

.claude/commands/ai-review-local.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,14 @@ pre-PR use. Designed for iterative review/revision cycles before submitting a PR
2222
files (default: 200000). Changed source files are always included regardless of budget.
2323
- `--force-fresh`: Skip delta-diff mode, run a full fresh review even if previous state exists
2424
- `--full-registry`: Include the entire REGISTRY.md instead of selective sections
25-
- `--model <name>`: Override the OpenAI model (default: `gpt-5.5`)
26-
- `--timeout <seconds>`: HTTP request timeout. If omitted, defaults to 900 for reasoning models (gpt-5.4, gpt-5.5, *-pro, o1/o3/o4) and 300 otherwise.
25+
- `--model <name>`: Override the OpenAI model (default: `gpt-5.4`)
26+
- `--timeout <seconds>`: HTTP request timeout (default: 300). Use 900 for reasoning models.
2727
- `--dry-run`: Print the compiled prompt without calling the API
2828

29-
**Reasoning models** (`gpt-5.5`, `gpt-5.5-pro`, `o3`, `o4-mini`, etc.): Reviews may take 10-15
29+
**Reasoning models** (`gpt-5.4-pro`, `o3`, `o4-mini`, etc.): Reviews may take 10-15
3030
minutes. For deep reviews with reasoning models, combine `--token-budget` with `--model`:
3131
```
32-
/ai-review-local --model gpt-5.5-pro --token-budget 500000 --context deep
32+
/ai-review-local --model gpt-5.4-pro --token-budget 500000 --context deep
3333
```
3434

3535
## Constraints
@@ -47,7 +47,7 @@ before any data is sent externally.
4747
### Step 1: Parse Arguments
4848

4949
Parse `$ARGUMENTS` for the optional flags listed above. All flags are optional —
50-
the default behavior (standard context, selective registry, gpt-5.5, live API call)
50+
the default behavior (standard context, selective registry, gpt-5.4, live API call)
5151
requires no arguments.
5252

5353
### Step 2: Validate Prerequisites
@@ -334,15 +334,9 @@ python3 .claude/scripts/openai_review.py \
334334
Note: `--force-fresh` is a skill-only flag — it controls whether delta diffs are
335335
generated in Step 4 and is NOT passed to the script.
336336

337-
**Reasoning model handling:** Resolve the effective model first — `effective_model` is
338-
the value of `--model` if the user provided one, otherwise the script default `gpt-5.5`.
339-
The `--model`, `--timeout`, and `--dry-run` flags pass through to the script when provided.
340-
341-
If `effective_model` contains `-pro`, starts with `o1`/`o3`/`o4`, or starts with
342-
`gpt-5.4`/`gpt-5.5` (e.g., `gpt-5.5`, `gpt-5.5-pro`, `o3`, `o4-mini`):
343-
- The script's `_resolve_timeout()` already auto-selects 900s for these models when
344-
`--timeout` is omitted, so no wrapper timeout pass-through is required. (Passing
345-
`--timeout 900` explicitly remains harmless and is fine for backward compatibility.)
337+
**Reasoning model handling:** If the model contains `-pro` or starts with `o1`/`o3`/`o4`
338+
(e.g., `gpt-5.4-pro`, `o3`, `o4-mini`):
339+
- Pass `--timeout 900` to the script (unless the user explicitly specified `--timeout`)
346340
- Run the Bash command with `run_in_background: true` (bypasses the 600s Bash tool timeout cap)
347341
- After the background command completes, continue to Step 6
348342

@@ -397,15 +391,15 @@ Review passed with no findings. Suggested next steps:
397391
- /submit-pr — commit and open a pull request
398392
```
399393

400-
**For ⛔ or ⚠️ (P0/P1/P2 findings)**:
394+
**For ⛔ or ⚠️ (P0/P1 findings)**:
401395
```
402396
Options:
403397
1. Enter plan mode to address findings (Recommended)
404398
2. Re-run with --full-registry for deeper methodology context
405399
3. Skip — I'll address these manually
406400
```
407401

408-
**For ✅ with P3 findings only**:
402+
**For ✅ with P2/P3 findings only**:
409403
```
410404
Options:
411405
1. Address findings before submitting
@@ -414,8 +408,8 @@ Options:
414408

415409
**If user chooses to address findings**: Parse the findings from the review output.
416410
The review context is already in the conversation. Start addressing the findings
417-
directly — for P0/P1/P2 issues use `EnterPlanMode` for a structured approach; for
418-
P3 issues, fix them directly since they are minor.
411+
directly — for P0/P1 issues use `EnterPlanMode` for a structured approach; for P2/P3
412+
issues, fix them directly since they are minor.
419413

420414
After fixes are committed, the user re-runs `/ai-review-local` for a follow-up review.
421415
On re-review, the script automatically activates delta-diff mode (comparing only
@@ -472,7 +466,7 @@ runs `--force-fresh` or when a rebase invalidates the tracked commit.
472466
/ai-review-local --model gpt-4.1 --full-registry
473467

474468
# Deep review with reasoning model (may take 10-15 minutes)
475-
/ai-review-local --model gpt-5.5-pro --token-budget 500000 --context deep
469+
/ai-review-local --model gpt-5.4-pro --token-budget 500000 --context deep
476470

477471
# Limit token budget for faster/cheaper reviews
478472
/ai-review-local --token-budget 100000
@@ -502,13 +496,12 @@ runs `--force-fresh` or when a rebase invalidates the tracked commit.
502496
- The review criteria are adapted from `.github/codex/prompts/pr_review.md` (same
503497
methodology axes, severity levels, and anti-patterns) but framed for local
504498
code-change review rather than PR review
505-
- The CI review (single-shot Responses API, same architecture as local but with
506-
`--ci-mode` and `--full-registry`) remains the authoritative final check — local
507-
review is a fast first pass to catch most issues early
499+
- The CI review (Codex action with full repo access) remains the authoritative final
500+
check — local review is a fast first pass to catch most issues early
508501
- **Data transmission**: In non-dry-run mode, this skill transmits the unified diff,
509502
changed-file metadata, full source file contents (in standard/deep mode),
510503
import-context files (in deep mode), selected methodology registry text, and
511-
prior review context (if present) to OpenAI via the Responses API.
504+
prior review context (if present) to OpenAI via the Chat Completions API.
512505
Use `--dry-run` to preview exactly what would be sent.
513506
- This skill pairs naturally with the iterative workflow:
514507
`/ai-review-local` -> address findings -> `/ai-review-local` -> `/submit-pr`

0 commit comments

Comments
 (0)