feat: add agent reviewer (--review) and bump to v0.4.4 by rubenmarcus · Pull Request #292 · multivmlabs/ralph-starter

rubenmarcus · 2026-03-12T16:23:31Z

Summary

Adds an LLM-powered agent reviewer step to the executor loop and prepares the v0.4.4 release.

Agent Reviewer (`--review`)

A new validation step that runs after lint/build/test pass but before commit. It analyzes the git diff with an LLM to catch issues that automated checks miss:

Security: hardcoded secrets, insecure defaults, injection vulnerabilities
Logic errors: off-by-one, race conditions, null access
Pattern violations: deviations from codebase conventions
Missing error handling: unhandled promises, missing try/catch

Usage: ralph-starter run "my task" --review

Findings with severity: error block the commit and feed back into the next iteration via lastValidationFeedback. Warnings are logged but non-blocking. Gracefully skips if no diff exists or no LLM API key is available.

Issue Cleanup

Closed 14 delivered issues that were still marked as open:

Open issues reduced from 19 → 5 (all genuine future work).

Files Changed

src/loop/reviewer.ts — new module (reviewer logic, LLM prompt, JSON parsing)
src/loop/executor.ts — integrate reviewer step after validation, before commit
src/cli.ts — add --review flag to run command
src/commands/run.ts — wire review option through to LoopOptions
src/index.ts — export public API types (ReviewResult, runReview, etc.)
package.json — bump version to 0.4.4

chatgpt-codex-connector · 2026-03-12T16:23:38Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

github-actions · 2026-03-12T16:26:00Z

✔️ Bundle Size Analysis

Metric	Value
Base	2729.53 KB
PR	2729.53 KB
Diff	0 KB (0%)

Bundle breakdown

156K	dist/auth
80K	dist/automation
4.0K	dist/cli.d.ts
4.0K	dist/cli.d.ts.map
24K	dist/cli.js
16K	dist/cli.js.map
664K	dist/commands
28K	dist/config
4.0K	dist/index.d.ts
4.0K	dist/index.d.ts.map
4.0K	dist/index.js
4.0K	dist/index.js.map
916K	dist/integrations
100K	dist/llm
1.2M	dist/loop
188K	dist/mcp
60K	dist/presets
92K	dist/setup
40K	dist/skills
392K	dist/sources
76K	dist/ui
144K	dist/utils
336K	dist/wizard

rubenmarcus · 2026-03-12T16:26:48Z

@greptileai

greptile-apps · 2026-03-12T16:30:24Z

Greptile Summary

This PR introduces an LLM-powered agent reviewer step (--review flag) that runs after lint/build/test validation passes but before auto-commit, and bumps the version to v0.4.4. It also adds white-label support (productName, dotDir) throughout the loop infrastructure and exports the new reviewer types/functions as public API.

Key changes:

src/loop/reviewer.ts — new module handling diff retrieval, LLM prompting, JSON parsing, and feedback formatting for structured code review findings
src/loop/executor.ts — integrates the reviewer between the validation block and the commit step; correctly respects pastWarmup and handles circuit-breaker tripping on reviewer errors
src/loop/memory.ts / src/loop/progress.ts — updated to accept a configurable dotDir for white-labeling the .ralph state directory
src/cli.ts / src/commands/run.ts — wire --review from CLI through to LoopOptions

Issues found:

git restore --staged . destroys pre-existing staged changes (src/loop/reviewer.ts:46): The cleanup call that was added to undo git add --intent-to-add --all is too broad — it unstages all staged files, not just the intent-to-add entries. A user who had staged specific changes before starting the loop will silently lose that staging state.
Reviewer LLM cost not counted by costTracker (src/loop/executor.ts:1518): runReview calls tryCallLLM internally with no hook into the executor's CostTracker, so reviewer token spend is invisible to maxCost enforcement. In a long loop, this can cause actual spend to silently exceed the configured budget.

Confidence Score: 3/5

Safe to merge after addressing the git index mutation bug and cost tracking gap; the core reviewer logic and loop integration are otherwise sound.
The reviewer integration is well-structured and the warmup/circuit-breaker handling is correct. However, the git restore --staged . cleanup is too broad and can silently destroy a user's staged changes — a concrete, reproducible data-loss scenario. The missing cost tracking is a budget-enforcement gap that becomes material in long loops with maxCost configured. These two issues lower confidence enough to warrant fixes before shipping.
src/loop/reviewer.ts (git index mutation) and src/loop/executor.ts (missing cost tracking) need attention before merge.

Important Files Changed

Filename	Overview
src/loop/reviewer.ts	New module implementing LLM-powered diff review. Contains a logic bug where `git restore --staged .` is too broad and destroys pre-existing user-staged changes. Core logic (JSON parsing, finding validation, feedback formatting) is well-structured.
src/loop/executor.ts	Integrates the reviewer step after validation, before commit. Reviewer LLM cost is not tracked by `costTracker`, which can cause actual spend to silently exceed `maxCost` limits. Also adds `productName` and `dotDir` white-label options.
src/loop/memory.ts	Minor update to support configurable `dotDir` parameter for white-labeling the `.ralph` directory. Changes are straightforward and correct.
src/loop/progress.ts	Adds `dotDir` parameter to `createProgressTracker` to support configurable state directory. Change is clean and backward-compatible with the default value.
src/cli.ts	Adds `--review` CLI flag wired to the new agent reviewer feature. Clean, minimal change.
src/commands/run.ts	Adds `review?: boolean` to `RunCommandOptions` and threads it through to `LoopOptions`. Straightforward wiring.
src/index.ts	Exports new public types (`ReviewFinding`, `ReviewResult`, `ReviewSeverity`) and `runReview` function from the reviewer module. Clean addition to the public API.
package.json	Version bump from 0.4.3 to 0.4.4. No issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant Executor
    participant Validation
    participant Reviewer
    participant Git
    participant LLM

    User->>Executor: runLoop({ review: true, ... })
    loop Each Iteration (i > 1, pastWarmup)
        Executor->>Executor: Agent runs task, produces changes
        alt validationCommands configured
            Executor->>Validation: runAllValidations()
            Validation-->>Executor: ValidationResult[]
            alt Validation fails
                Executor->>Executor: lastValidationFeedback = feedback
                Executor->>Executor: continue (next iteration)
            else Validation passes
                Executor->>Executor: circuitBreaker.recordSuccess()
                Executor->>Executor: lastValidationFeedback = ''
            end
        end
        alt options.review && hasChanges
            Executor->>Reviewer: runReview(cwd)
            Reviewer->>Git: git add --intent-to-add --all
            Reviewer->>Git: git diff HEAD
            Git-->>Reviewer: diff output
            Reviewer->>Git: git restore --staged .
            Reviewer->>LLM: tryCallLLM(diff, REVIEW_SYSTEM_PROMPT)
            LLM-->>Reviewer: JSON findings[]
            Reviewer->>Reviewer: parseFindings()
            Reviewer-->>Executor: ReviewResult { passed, findings }
            alt reviewResult.passed == false (has errors)
                Executor->>Executor: circuitBreaker.recordFailure()
                Executor->>Executor: lastValidationFeedback = formatReviewFeedback()
                Executor->>Executor: continue (next iteration)
            else reviewResult.passed == true
                Executor->>Executor: circuitBreaker.recordSuccess()
                Executor->>Executor: lastValidationFeedback = ''
            end
        end
        alt options.commit && hasChanges
            Executor->>Git: gitCommit()
        end
    end
    Executor-->>User: LoopResult

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/loop/reviewer.ts
Line: 41-46

Comment:
**`git restore --staged .` unstages genuine pre-existing staged changes**

`git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.

For example:
1. User runs `git add src/config.ts` to stage a specific file
2. User runs `ralph-starter run "my task" --review`
3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed

A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward:

```typescript
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);

await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });

// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
  await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1514-1515

Comment:
**Reviewer LLM cost not counted by `costTracker`**

`runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.

Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for:

```typescript
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);
```

And in `reviewer.ts`:
```typescript
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
  // ...
  const response = await tryCallLLM({ ... });
  if (response && costTracker) {
    costTracker.recordUsage(response); // track reviewer tokens
  }
  // ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: e1f9ef7}

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T16:41:37Z

@greptileai

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T16:51:14Z

@greptileai

src/loop/executor.ts

rubenmarcus · 2026-03-12T17:01:07Z

@greptileai

src/loop/executor.ts

rubenmarcus · 2026-03-12T17:08:01Z

@greptileai

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T17:24:52Z

@greptileai

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T17:31:56Z

@greptileai

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T17:38:28Z

@greptileai

src/loop/reviewer.ts

- Add LLM-powered diff review step that runs after lint/build/test pass but before commit, catching security issues, logic errors, and pattern violations that automated checks miss - New src/loop/reviewer.ts module using existing tryCallLLM infrastructure (Anthropic/OpenAI/OpenRouter) with structured JSON findings output - Errors block commit and feed back via lastValidationFeedback; warnings are logged but non-blocking; gracefully skips if no diff or no API key - Wire --review CLI flag on run command and export public API types - Close 14 delivered issues (#212, #224, #225, #226, #227, #228, #229, #231, #232, #233, #237, #239, #240, #241) - Bump version to 0.4.4 Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019ce2d4-0f5f-742e-9721-3181f57267df

rubenmarcus · 2026-03-12T17:46:31Z

@greptileai

src/loop/reviewer.ts

src/loop/executor.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rubenmarcus added enhancement core labels Mar 12, 2026

rubenmarcus force-pushed the feat/agent-reviewer branch from ed90fa8 to b549ad2 Compare March 12, 2026 16:25

github-actions bot assigned rubenmarcus Mar 12, 2026

github-actions bot added candidate-release PR is ready for release config labels Mar 12, 2026