|
| 1 | +--- |
| 2 | +name: review-code |
| 3 | +description: Perform a thorough code review of the current branch or a GitHub PR by number. |
| 4 | +argument-hint: [pr-number] [special instructions] |
| 5 | +disable-model-invocation: true |
| 6 | +--- |
| 7 | + |
| 8 | +# Review Code Changes |
| 9 | + |
| 10 | +Perform a comprehensive code review of either the current branch or a specific GitHub pull request. |
| 11 | + |
| 12 | +## Arguments |
| 13 | + |
| 14 | +`$ARGUMENTS` determines the review mode: |
| 15 | + |
| 16 | +**PR mode** — first argument is a number: |
| 17 | +- `366` — review PR #366 |
| 18 | +- `366 focus on the API changes` — review PR #366 with a focus area |
| 19 | + |
| 20 | +**Branch mode** — no number, or only instructions: |
| 21 | +- *(empty)* — review current branch against `main` |
| 22 | +- `compare against develop` — review against a different base |
| 23 | +- `focus on the API changes` — review current branch with a focus area |
| 24 | + |
| 25 | +Additional instructions work in both modes: |
| 26 | +- `be strict about type annotations` |
| 27 | +- `skip style nits` |
| 28 | + |
| 29 | +## Step 1: Gather Changes |
| 30 | + |
| 31 | +### If PR mode (argument starts with a number) |
| 32 | + |
| 33 | +Run these commands in parallel using `gh`: |
| 34 | + |
| 35 | +1. **PR details**: `gh pr view <number> --json title,body,author,baseRefName,headRefName,state,additions,deletions,changedFiles,commits,url` |
| 36 | +2. **PR diff**: `gh pr diff <number>` |
| 37 | +3. **PR files**: `gh pr diff <number> --name-only` |
| 38 | +4. **PR commits**: `gh pr view <number> --json commits --jq '.commits[].messageHeadline'` |
| 39 | +5. **Existing inline review comments**: `gh api repos/{owner}/{repo}/pulls/<number>/comments --paginate --jq '.[].body'` |
| 40 | +5b. **Existing PR-level reviews** (top-level review bodies from "Review changes"): `gh api repos/{owner}/{repo}/pulls/<number>/reviews --paginate --jq '.[].body'` |
| 41 | +6. **Repo info**: `gh repo view --json nameWithOwner -q '.nameWithOwner'` |
| 42 | + |
| 43 | +Then get the PR branch locally for full file access. Prefer a **worktree** so your current branch and uncommitted work are untouched: |
| 44 | + |
| 45 | +```bash |
| 46 | +git fetch origin pull/<number>/head:pr-<number> --force |
| 47 | +git worktree add /tmp/review-<number> pr-<number> |
| 48 | +# Cleanup when done: git worktree remove /tmp/review-<number> && git branch -D pr-<number> |
| 49 | +``` |
| 50 | + |
| 51 | +If worktrees aren't suitable, you can use `gh pr checkout <number>` (this switches your current branch — only if you have no uncommitted work). Run the rest of the review from `/tmp/review-<number>`. |
| 52 | + |
| 53 | +If checkout isn't possible (e.g., external fork), use `gh api` to fetch file contents: |
| 54 | + |
| 55 | +```bash |
| 56 | +gh api repos/{owner}/{repo}/contents/{path}?ref={head-branch} --jq '.content' | base64 --decode |
| 57 | +``` |
| 58 | + |
| 59 | +**Important checks:** |
| 60 | +- If the PR number doesn't exist, inform the user |
| 61 | +- If the PR is merged or closed, note the state but proceed (useful for post-merge audits) |
| 62 | +- If the PR is a draft, note it — review may be on incomplete work |
| 63 | +- For very large diffs (>3000 lines), fetch and read changed files individually instead of relying solely on the diff |
| 64 | + |
| 65 | +### If Branch mode (no number) |
| 66 | + |
| 67 | +First, fetch the base branch to ensure the remote ref is current: |
| 68 | + |
| 69 | +0. **Fetch base**: `git fetch origin <base>` |
| 70 | + |
| 71 | +Then run these commands in parallel: |
| 72 | + |
| 73 | +1. **Current branch**: `git branch --show-current` |
| 74 | +2. **Commits on branch**: `git log origin/<base>..HEAD --oneline` |
| 75 | +3. **File changes summary**: `git diff --stat origin/<base>..HEAD` |
| 76 | +4. **Full diff**: `git diff origin/<base>..HEAD` |
| 77 | +5. **Uncommitted changes**: `git status --porcelain` |
| 78 | +6. **Merge base**: `git merge-base origin/<base> HEAD` |
| 79 | + |
| 80 | +Where `<base>` is `main` unless overridden in arguments. |
| 81 | + |
| 82 | +**Important checks:** |
| 83 | +- If no commits ahead of base, inform the user there's nothing to review |
| 84 | +- If uncommitted changes exist, note them but review committed changes only |
| 85 | +- For very large diffs (>3000 lines), read changed files individually instead of relying solely on the diff |
| 86 | + |
| 87 | +## Step 2: Load Project Guidelines |
| 88 | + |
| 89 | +Read `AGENTS.md` at the repository root to load the project's coding standards, design principles, and conventions. This is the authoritative source for: |
| 90 | + |
| 91 | +- Code style rules (formatting, naming, imports, type annotations) |
| 92 | +- Design principles (DRY, KISS, YAGNI, SOLID) |
| 93 | +- Testing patterns and expectations |
| 94 | +- Architecture and layering conventions |
| 95 | +- Common pitfalls to watch for |
| 96 | +- Lazy loading and `TYPE_CHECKING` patterns |
| 97 | + |
| 98 | +Use these guidelines as the baseline for the entire review. Any project-specific rules in `AGENTS.md` take precedence over general best practices. |
| 99 | + |
| 100 | +## Step 3: Understand the Scope |
| 101 | + |
| 102 | +Before diving into details, build a mental model: |
| 103 | + |
| 104 | +1. **Read the PR description** (PR mode) or commit messages to understand the stated intent |
| 105 | +2. **Read each commit message** to understand the progression of changes |
| 106 | +3. **Group changed files** by module/package to identify which areas are affected |
| 107 | +4. **Identify the primary goal** (feature, refactor, bugfix, etc.) |
| 108 | +5. **Note cross-cutting concerns** (e.g., a rename that touches many files vs. substantive logic changes) |
| 109 | +6. **Check existing feedback** (PR mode): inspect both inline comments (Step 1, item 5) and PR-level review bodies (Step 1, item 5b) so you don't duplicate feedback already given |
| 110 | + |
| 111 | +## Step 4: Review Each Changed File (Multi-Pass) |
| 112 | + |
| 113 | +Perform **at least 2-3 passes** over the changed files. Each pass has a different focus — this catches issues that a single read-through would miss. |
| 114 | + |
| 115 | +**Scope rule: Only flag issues introduced or modified by this changeset.** Read the full file for context, but do not report pre-existing patterns, style issues, or design choices that were already present before this branch/PR. If existing code was merely moved without modification, don't flag it. The goal is to review what the author changed, not audit the entire file. |
| 116 | + |
| 117 | +### Pass 1: Correctness & Logic |
| 118 | + |
| 119 | +Read each changed file in full (not just the diff), but evaluate only the **new or modified code**: |
| 120 | + |
| 121 | +- Logic errors, off-by-one, wrong operator, inverted condition |
| 122 | +- Missing edge case handling (None, empty collections, boundary values) |
| 123 | +- Truthy/falsy checks on values where 0, empty string, or None is valid (e.g. `if index:` when index can be 0) |
| 124 | +- Defensive `getattr(obj, attr, fallback)` or `.get()` on Pydantic models where the field always exists with a default |
| 125 | +- Silent behavior changes for existing users that aren't called out in the PR description |
| 126 | +- Race conditions or concurrency issues |
| 127 | +- Resource leaks (unclosed files, connections, missing cleanup) |
| 128 | +- Incorrect error handling (swallowed exceptions, wrong exception type) |
| 129 | +- Input validation at boundaries (user input, API responses, file I/O) |
| 130 | +- Graceful degradation on failure |
| 131 | + |
| 132 | +### Pass 2: Design, Architecture & API |
| 133 | + |
| 134 | +Re-read the changed files with a focus on **structure and design of the new/modified code**: |
| 135 | + |
| 136 | +- Does the change fit the existing architecture and patterns? |
| 137 | +- Are new abstractions at the right level? (too abstract / too concrete) |
| 138 | +- Single responsibility — does each new function/class do one thing? |
| 139 | +- Are new dependencies flowing in the right direction? |
| 140 | +- Could this introduce circular imports or unnecessary coupling? |
| 141 | +- Are new or modified public signatures clear and minimal? |
| 142 | +- Are return types precise (not overly broad like `Any`)? |
| 143 | +- Could the new API be misused easily? Is it hard to use incorrectly? |
| 144 | +- Are breaking changes to existing interfaces intentional and documented? |
| 145 | +- Unnecessary wrapper functions or dead code left behind after refactors |
| 146 | +- Scalability: in-memory operations that could OOM on large datasets |
| 147 | +- Raw exceptions leaking instead of being normalized to project error types (see AGENTS.md / interface errors) |
| 148 | +- Obvious inefficiencies introduced by this change (N+1 queries, repeated computation, unnecessary copies) |
| 149 | +- Appropriate data structures for the access pattern |
| 150 | + |
| 151 | +### Pass 3: Standards, Testing & Polish |
| 152 | + |
| 153 | +Final pass focused on **project conventions and test quality for new/modified code only**: |
| 154 | + |
| 155 | +**Testing:** |
| 156 | +- Are new code paths covered by tests? |
| 157 | +- Do new tests verify behavior, not implementation details? (Flag tests that only verify plumbing — e.g. "mock was called" — without exercising actual behavior.) |
| 158 | +- Duplicate test setup across tests that should use fixtures or `@pytest.mark.parametrize` |
| 159 | +- Prefer flat test functions over test classes unless grouping is meaningful |
| 160 | +- Are edge cases tested? |
| 161 | +- Are mocks/stubs used appropriately (at boundaries, not deep internals)? |
| 162 | +- Do new test names clearly describe what they verify? |
| 163 | + |
| 164 | +**Project Standards (from AGENTS.md) — apply to new/modified code only:** |
| 165 | + |
| 166 | +Verify the items below on lines introduced or changed by this branch. Refer to the full `AGENTS.md` loaded in Step 2 for details and examples. |
| 167 | + |
| 168 | +- License headers: if present, they should be correct (wrong year or format → suggest `make update-license-headers`; don't treat as critical if CI enforces this) |
| 169 | +- `from __future__ import annotations` in new files |
| 170 | +- Type annotations on new/modified functions, methods, and class attributes |
| 171 | +- Modern type syntax (`list[str]`, `str | None` — not `List[str]`, `Optional[str]`) |
| 172 | +- Absolute imports only (no relative imports) |
| 173 | +- Lazy loading for heavy third-party imports via `lazy_heavy_imports` + `TYPE_CHECKING` |
| 174 | +- Naming: snake_case functions starting with a verb, PascalCase classes, UPPER_SNAKE_CASE constants |
| 175 | +- No vacuous comments — comments only for non-obvious intent |
| 176 | +- Public before private ordering in new classes |
| 177 | +- Design principles: DRY (extract on third occurrence), KISS (flat over clever), YAGNI (no speculative abstractions) |
| 178 | +- Common pitfalls: no mutable default arguments, no unused imports, simplify where possible |
| 179 | + |
| 180 | +## Step 5: Run Linter |
| 181 | + |
| 182 | +Run the linter on all changed files (requires local checkout). Use the venv directly to avoid sandbox permission issues in some environments (e.g. Claude Code): |
| 183 | + |
| 184 | +```bash |
| 185 | +.venv/bin/ruff check <changed-files> |
| 186 | +.venv/bin/ruff format --check <changed-files> |
| 187 | +``` |
| 188 | + |
| 189 | +> **Note**: This runs ruff only on the changed files for speed. For a full project-wide check, use `make check-all` or `uv run ruff check` (and `ruff format --check`) without file arguments. |
| 190 | +
|
| 191 | +If the branch isn't checked out locally (e.g., external fork in PR mode), skip this step and note it in the review. |
| 192 | + |
| 193 | +## Step 6: Produce the Review |
| 194 | + |
| 195 | +Output a structured review using the format below. Use the **PR template** or **Branch template** for the overview depending on the mode. |
| 196 | + |
| 197 | +Write the review to a **temporary markdown file outside the repository** (e.g. `/tmp/review-<pr-or-branch>.md`) so other agents or tools can consume it without polluting `git status`. Do not commit this file; treat it as ephemeral. |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +### Overview (PR mode) |
| 202 | + |
| 203 | +| | | |
| 204 | +|---|---| |
| 205 | +| **PR** | [#\<number\> \<title\>](\<url\>) | |
| 206 | +| **Author** | `<author>` | |
| 207 | +| **Base** | `<base-branch>` | |
| 208 | +| **Files changed** | `<count>` | |
| 209 | +| **Insertions/Deletions** | `+<ins> / -<del>` | |
| 210 | + |
| 211 | +### Overview (Branch mode) |
| 212 | + |
| 213 | +| | | |
| 214 | +|---|---| |
| 215 | +| **Branch** | `<branch-name>` | |
| 216 | +| **Commits** | `<count>` commits ahead of `<base>` | |
| 217 | +| **Files changed** | `<count>` | |
| 218 | +| **Insertions/Deletions** | `+<ins> / -<del>` | |
| 219 | + |
| 220 | +### (Both modes) |
| 221 | + |
| 222 | +**Summary**: 1-2 sentence description of what the changes accomplish. In PR mode, note whether the implementation matches the stated intent in the PR description. |
| 223 | + |
| 224 | +### Findings |
| 225 | + |
| 226 | +Group findings by severity. Format each finding as a heading + bullet list — do NOT use numbered lists: |
| 227 | + |
| 228 | +``` |
| 229 | +**`path/to/file.py:42` — Short title** |
| 230 | +- **What**: Concise description of the issue. |
| 231 | +- **Why**: Why it matters. |
| 232 | +- **Suggestion**: Concrete fix or improvement (with code snippet when helpful). |
| 233 | +``` |
| 234 | + |
| 235 | +Separate each finding with a blank line. Use bold file-and-title as a heading line, then bullet points for What/Why/Suggestion. Never use numbered lists (`1.`, `2.`) for findings or their sub-items — they render poorly in terminals. |
| 236 | + |
| 237 | +#### Critical — Must fix before merge |
| 238 | +> Issues that would cause bugs, data loss, security vulnerabilities, or broken functionality. |
| 239 | +
|
| 240 | +#### Warnings — Strongly recommend fixing |
| 241 | +> Design issues, missing error handling, test gaps, or violations of project standards that could cause problems later. |
| 242 | +
|
| 243 | +#### Suggestions — Consider improving |
| 244 | +> Style improvements, minor simplifications, or optional enhancements that would improve code quality. |
| 245 | +
|
| 246 | +### What Looks Good |
| 247 | + |
| 248 | +Call out 2-3 things done well (good abstractions, thorough tests, clean refactoring, etc.). Positive feedback is part of a good review. |
| 249 | + |
| 250 | +### Verdict |
| 251 | + |
| 252 | +One of: |
| 253 | +- **Ship it** — No critical issues, ready to merge |
| 254 | +- **Ship it (with nits)** — Minor suggestions but nothing blocking |
| 255 | +- **Needs changes** — Has issues that should be addressed before merge |
| 256 | +- **Needs discussion** — Architectural or design questions that need team input |
| 257 | + |
| 258 | +### Next steps (optional) |
| 259 | + |
| 260 | +After the summary, you may suggest follow-ups when useful: |
| 261 | + |
| 262 | +- Deep dive into a specific file or finding |
| 263 | +- Check a specific concern in more detail |
| 264 | +- Install dev deps and add smoke tests (e.g. `uv sync --all-extras`, then run tests or suggest minimal smoke tests) |
| 265 | + |
| 266 | +--- |
| 267 | + |
| 268 | +## Review Principles |
| 269 | + |
| 270 | +- **Only flag what's new**: Report issues introduced by this changeset — not pre-existing patterns or style in untouched code, unless explicitly asked by the user |
| 271 | +- **Be specific**: "This could return None on line 42 when `items` is empty" not "handle edge cases better" |
| 272 | +- **Suggest, don't just criticize**: Always pair a problem with a concrete suggestion |
| 273 | +- **Distinguish severity honestly**: Don't inflate nits to warnings; don't downplay real issues |
| 274 | +- **Consider intent**: Review what the author was trying to do, not what you would have done differently |
| 275 | +- **Batch related issues**: If the same pattern appears in multiple places, note it once and list all locations |
| 276 | +- **Read the full file**: Diff-only reviews miss context — always read the surrounding code, but only flag new issues |
| 277 | +- **Don't repeat existing feedback**: In PR mode, check both inline comments and PR-level review bodies and skip issues already raised |
| 278 | + |
| 279 | +**Do not flag (focus on what CI won't catch):** |
| 280 | + |
| 281 | +- Issues that are supposed to be caught by CI (linter, typechecker, formatter) — mention "run `make check-all`" if relevant, but don't list every style nit |
| 282 | +- Pre-existing issues on unmodified lines |
| 283 | +- Pedantic nits that don't affect correctness or maintainability |
| 284 | +- Intentional functionality or API changes that are clearly documented |
| 285 | + |
| 286 | +## Edge Cases |
| 287 | + |
| 288 | +- **No changes**: Inform user there's nothing to review |
| 289 | +- **PR not found**: Inform user the PR number doesn't exist |
| 290 | +- **Merged/closed PR**: Note the state, proceed with review anyway |
| 291 | +- **Draft PR**: Note it's a draft; review may be on incomplete work |
| 292 | +- **External fork**: Can't checkout locally — use `gh api` to fetch file contents and skip the linter step |
| 293 | +- **Huge changeset** (>50 files): Summarize by module first, then review the most critical files in detail; ask user if they want the full file-by-file review |
| 294 | +- **Only renames/moves**: Note that changes are structural and focus on verifying nothing broke |
| 295 | +- **Only test changes**: Focus review on test quality, coverage, and correctness of assertions |
| 296 | +- **Only config/docs changes**: Adjust review to focus on accuracy and completeness rather than code quality |
0 commit comments