stack3:feat: add execution state orchestration and retry flow by VX1D · Pull Request #162 · michaelshimeles/ralphy

VX1D · 2026-03-03T23:44:30Z

PR3: execution state

Summary

This PR introduces the stateful execution core used by later agent/parallel work: task-state transitions, planning scaffolding, locking primitives, and prompt/test-orchestration contracts.

Why this PR exists

Parallel agents need a single source of truth for task lifecycle.
Retry/defer/skip behavior must be deterministic before scaling out execution.
Planning and test-agent workflows require explicit prompt contracts.

What it adds

New execution modules under cli/src/execution/:
- task-state.ts (state machine + claim/transition APIs)
- locking.ts (file/task lock semantics)
- planning.ts (planning cache, repo fingerprinting, planned-file parsing)
- orchestrator.ts, skill-compress.ts, progress-types.ts
- upgraded sequential.ts, retry.ts, prompt.ts
UI support for agent execution tracking:
- cli/src/ui/static-agent-display.ts
Task contract extension in cli/src/tasks/types.ts (description/dependencies/csv-aware source typing).

PR preview

Task lifecycle is explicit and inspectable: pending -> running -> completed/failed/deferred/skipped.
Planning output is structured and machine-parseable (ANALYSIS, PLAN, FILES, OPTIMIZATION).
Sequential execution now keeps file-backed task source state aligned with state-manager transitions.
Planning cache and repo fingerprint logic reduce repeated planning work on unchanged repos.
Agent display gets deterministic planning/execution progress events.

Concrete scenarios this fixes

Two workers race for the same task: claim API prevents duplicate execution.
A task fails non-retryably: task source is marked consistently, avoiding phantom re-runs.
Planner returns malformed block content: parser falls back with normalized file extraction logic.

Planning and test-agent details

Adds structured planning prompt contract with required tags:
- <ANALYSIS>, <PLAN>, <FILES>, <OPTIMIZATION>
Adds planning cache persistence and repo fingerprint invalidation.
Introduces planning progress event/callback model used by agent display.
Enables orchestrator-style execution path for test-model workflows.

Reliability hardening

Sequential failure paths now keep task source and state manager in sync.
Retry/defer flows are explicit and bounded.
Locking tests added for race and stale-lock behavior.

Tests added/updated

cli/__tests__/locking.test.ts
cli/__tests__/locking-security.test.ts
cli/__tests__/orchestrator.test.ts
cli/src/execution/prompt.test.ts

Validation

Included in cumulative stack verification.
Stack passes bun run check, bun tsc --noEmit, and bun test in order.

Note

Add execution-state orchestration and retry flow by moving sequential runs to `TaskStateManager` with atomic claims and integrating circuit breaker retries across execution and testing

Introduce centralized task state with file locks, atomic claims, and transitions in sequential execution; add an orchestrated main→test→fix flow with circuit breaker–backed retries; refactor prompt building to include environment/skills and protected paths; add locking utilities and tests; standardize errors and telemetry redaction; and switch CLI commands to throw instead of process.exit.

📍Where to Start

Start with runSequential in sequential.ts, then review state operations in TaskStateManager in task-state.ts and the retry/circuit logic in withRetry in retry.ts.

^{Macroscope summarized 4be1eff.}

vercel · 2026-03-03T23:44:34Z

@VX1D is attempting to deploy a commit to the Goshen Labs Team on Vercel.

A member of the Team first needs to authorize it.

greptile-apps · 2026-03-03T23:48:38Z

Greptile Summary

This PR introduces a substantial stateful execution core for Ralphy's task lifecycle: atomic task claiming, state persistence in multiple formats (YAML/JSON/CSV/MD), file-based cross-process locking, a planning cache with repo fingerprinting, a test-model orchestrator, and an upgraded sequential runner. The work is architecturally sound and fixes several real problems (Windows key collisions via URL encoding, CSV quoting, generateTaskHash hash collisions, misclassified Python/JS environments, hardcoded 2000 ms connection backoff).

Key concerns remaining after reviewing the current state of the code:

transitionState is unlocked — claimTaskForExecution uses a file lock to prevent double-claiming, but transitionState (called to record COMPLETED/FAILED/DEFERRED outcomes) writes the full state file without reloading or locking. Concurrent parallel workers can silently overwrite each other's state transitions, the exact race condition this PR aims to prevent.
Dead code in sequential.ts — the if (dryRun && !options.debugOpenCode) branch inside the main loop (line 224) can never be reached because the identical guard at the top of the function returns early before the loop starts.
planTaskFiles attemptNumber miscalculation — exponential backoff on connection errors uses DEFAULT_MAX_REPLANS - maxReplans + 1 to derive the attempt index; this produces a wrong (too-large) value when called with a custom maxReplans less than the constant, causing the maximum delay on the first retry.
Atomics.wait on the Node.js main thread (flagged previously) — lock contention in a Node.js (non-Bun) environment calls sleepBlocking, which throws TypeError [ERR_INVALID_STATE] and crashes the process.

Confidence Score: 2/5

Not safe to merge as-is — the unlocked transitionState creates a data-loss race in the parallel execution path this PR is specifically designed to make safe.
Several issues flagged in previous review rounds have been addressed (URL-encoded task keys fixing the Windows path bug, SHA-256 task hashing, correct baseDelayMs in exponential backoff, hasExceededMaxAttempts using >=). However, the central safety guarantee of this PR — that concurrent agents maintain consistent task state — is undermined by transitionState writing state without a lock or a fresh load. Combined with the dead-code path in sequential.ts and the incorrect backoff attemptNumber calculation, the PR needs targeted fixes before merging.
cli/src/execution/task-state.ts (unlocked transitionState), cli/src/execution/sequential.ts (dead-code dryRun branch), cli/src/execution/planning.ts (attemptNumber calculation), cli/src/execution/locking.ts (Atomics.wait on Node.js main thread)

Important Files Changed

Filename	Overview
cli/src/execution/sequential.ts	Core sequential executor now integrates TaskStateManager with atomic claims, safety counter, and proper dry-run path. Mostly solid; minor code organization concern (taskIndex stays empty until first miss).
cli/src/execution/task-state.ts	Central task lifecycle manager with URL-encoded keys (fixing Windows path issues), proper CSV quoting, and prototype-pollution guards. transitionState writes state to disk without acquiring a file lock, creating a race window in parallel execution where concurrent transitions can overwrite each other.
cli/src/execution/locking.ts	File-based locking with atomic wx flag, in-memory cache, and stale-lock cleanup. refreshLock now correctly defers in-memory update until after write (fixing an earlier race). Atomics.wait fallback still blocks Node.js main thread on lock contention.
cli/src/execution/orchestrator.ts	Test-model orchestrator with circuit-breaker retries and display integration. hasFailures detection patterns are now anchored (improving over the old substring match). shouldUseOrchestrator still triggers on very broad impl keywords; agentNum falsy check (0) is still present.
cli/src/execution/planning.ts	Planning phase with SHA-256 task hash (fixing collision bug), fingerprint cache, and structured tag parsing. planningModel is now wired correctly. attemptNumber for backoff delay is computed from DEFAULT_MAX_REPLANS instead of a local counter, giving wrong delay when called with custom maxReplans.
cli/src/execution/skill-compress.ts	Skill file compression for token efficiency. Global keyword substitutions (parameters→params, environment→env, etc.) still applied to entire file content including code blocks, risking corruption of JSON schemas and CLI examples inside skill files.
cli/src/utils/errors.ts	Standardized error hierarchy with isRetryableError. Several patterns present in the old retry.ts (hit your limit, quota, ENOTFOUND, overloaded) are absent from retryableMessages, causing transient errors to be treated as permanent failures.
cli/src/ui/static-agent-display.ts	Singleton terminal display for parallel agents. Constructor now stops the previous singleton before replacing it, fixing the timer-leak race. setAgentStatus correctly updates taskTitle in the else branch.

Sequence Diagram

sequenceDiagram
    participant CLI as runSequential
    participant TSM as TaskStateManager
    participant Lock as FileLock
    participant Disk as StateFile (.ralphy/)
    participant Eng as AIEngine
    participant Orch as Orchestrator

    CLI->>TSM: initialize(allTasks)
    TSM->>Disk: loadState() / persistState()

    loop For each pending task
        CLI->>TSM: getNextPendingTask()
        TSM-->>CLI: pendingTask

        CLI->>TSM: claimTaskForExecution(taskId)
        TSM->>Lock: acquireFileLock(stateFile.claim)
        TSM->>Disk: loadState()
        TSM->>Disk: persistState() [state=RUNNING]
        TSM->>Lock: releaseFileLock()
        TSM-->>CLI: claimed=true

        alt testModel configured & shouldUseOrchestrator
            CLI->>Orch: executeWithOrchestrator(prompt)
            Orch->>Eng: executeWithRetry(mainModel)
            Eng-->>Orch: mainResult
            Orch->>Eng: executeWithRetry(testModel, testPrompt)
            Eng-->>Orch: testResult
            alt hasFailures
                Orch->>Eng: executeWithRetry(mainModel, fixPrompt)
            end
            Orch-->>CLI: OrchestratorResult
        else standard execution
            CLI->>Eng: withRetry(execute)
            Eng-->>CLI: AIResult
        end

        alt success
            CLI->>TSM: transitionState(COMPLETED)
            Note over TSM,Disk: ⚠️ No lock acquired — parallel<br/>writers can overwrite each other
            TSM->>Disk: persistState()
        else retryable failure
            CLI->>TSM: transitionState(DEFERRED)
            TSM->>Disk: persistState()
        else fatal / non-retryable
            CLI->>TSM: transitionState(FAILED)
            TSM->>Disk: persistState()
        end
    end

_{Last reviewed commit: 4be1eff}

greptile-apps · 2026-03-03T23:48:51Z

cli/__tests__/locking-security.test.ts

+			const success = acquireFileLock(testFile, TEST_BASE);
+			expect(success).toBe(true);
+
+			releaseFileLock(testFile, TEST_BASE);


Hardcoded lock-dir path diverges from the LOCK_DIR constant and the other test file

This line hardcodes ".ralphy-locks":

const lockDir = join(TEST_BASE, ".ralphy-locks");

However, locking.test.ts (line 349) uses join(workDir, ".ralphy", "locks"), which implies LOCK_DIR = ".ralphy/locks". Since LOCK_DIR is already imported at the top of this test file, this assertion may be checking the wrong directory and could give a false-positive result (the lockFiles.length check against 5050 would trivially pass if it looks in a non-existent directory).

Suggested change

releaseFileLock(testFile, TEST_BASE);

const lockDir = join(TEST_BASE, LOCK_DIR);

Prompt To Fix With AI

This is a comment left during a code review. Path: cli/__tests__/locking-security.test.ts Line: 210 Comment: **Hardcoded lock-dir path diverges from the `LOCK_DIR` constant and the other test file** This line hardcodes `".ralphy-locks"`: ```ts const lockDir = join(TEST_BASE, ".ralphy-locks"); ``` However, `locking.test.ts` (line 349) uses `join(workDir, ".ralphy", "locks")`, which implies `LOCK_DIR = ".ralphy/locks"`. Since `LOCK_DIR` is already imported at the top of this test file, this assertion may be checking the wrong directory and could give a false-positive result (the `lockFiles.length` check against 5050 would trivially pass if it looks in a non-existent directory). ```suggestion const lockDir = join(TEST_BASE, LOCK_DIR); ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T23:49:00Z

Additional Comments (5)

cli/src/execution/prompt.ts
statSync follows symlinks — isSymbolicLink() always returns false

statSync dereferences the symlink before returning stats, so isSymbolicLink() is always false on its result. You need lstatSync to get the stats of the symlink entry itself. As written, detectSymlinks will always return an empty array, so the "Symlink Analysis" section in the prompt can never be populated.

				return require("node:fs").lstatSync(join(workDir, d.name as string)).isSymbolicLink();

Or, since lstatSync is already available in the import-block at the top of this file (if you add it), replace the import and usage accordingly:

import { existsSync, readFileSync, readdirSync, lstatSync, statSync } from "node:fs";

and then:

				return lstatSync(join(workDir, d.name as string)).isSymbolicLink();

Prompt To Fix With AI

This is a comment left during a code review.
Path: cli/src/execution/prompt.ts
Line: 2388-2393

Comment:
**`statSync` follows symlinks — `isSymbolicLink()` always returns false**

`statSync` dereferences the symlink before returning stats, so `isSymbolicLink()` is always `false` on its result. You need `lstatSync` to get the stats of the symlink entry itself. As written, `detectSymlinks` will always return an empty array, so the "Symlink Analysis" section in the prompt can never be populated.

```suggestion
				return require("node:fs").lstatSync(join(workDir, d.name as string)).isSymbolicLink();
```

Or, since `lstatSync` is already available in the import-block at the top of this file (if you add it), replace the import and usage accordingly:

```
import { existsSync, readFileSync, readdirSync, lstatSync, statSync } from "node:fs";
```

and then:

```typescript
				return lstatSync(join(workDir, d.name as string)).isSymbolicLink();
```

How can I resolve this? If you propose a fix, please make it concise.

cli/src/execution/locking.ts
Synchronous busy-wait blocks the Node.js event loop

The Node.js fallback path spins in a tight busy-wait loop for up to 5 seconds per retry attempt:

while (Date.now() - start < delay) {
    if (Date.now() % 10 === 0) { }  // This does nothing
}

The empty if block yields nothing. This will fully block the Node.js event loop (and any other async work, timers, or I/O) for the entire duration of the delay. Since maxRetries can be 5 and delay can be up to 5 seconds each, a single lock contention could freeze the runtime for up to 25 seconds.

Either make acquireFileLock async and use await sleep(delay), or use Atomics.wait with a SharedArrayBuffer for a proper synchronous yield:

// Option: truly async acquisition
export async function acquireFileLock(...): Promise<boolean> {
    // ...
    await sleep(delay);
    // ...
}

If the function must remain synchronous, use Atomics.wait:

const sab = new SharedArrayBuffer(4);
const arr = new Int32Array(sab);
Atomics.wait(arr, 0, 0, delay); // blocks the thread without spinning

Prompt To Fix With AI

This is a comment left during a code review.
Path: cli/src/execution/locking.ts
Line: 1019-1026

Comment:
**Synchronous busy-wait blocks the Node.js event loop**

The Node.js fallback path spins in a tight busy-wait loop for up to 5 seconds per retry attempt:

```ts
while (Date.now() - start < delay) {
    if (Date.now() % 10 === 0) { }  // This does nothing
}
```

The empty `if` block yields nothing. This will fully block the Node.js event loop (and any other async work, timers, or I/O) for the entire duration of the delay. Since `maxRetries` can be 5 and `delay` can be up to 5 seconds each, a single lock contention could freeze the runtime for up to 25 seconds.

Either make `acquireFileLock` async and use `await sleep(delay)`, or use `Atomics.wait` with a `SharedArrayBuffer` for a proper synchronous yield:

```ts
// Option: truly async acquisition
export async function acquireFileLock(...): Promise<boolean> {
    // ...
    await sleep(delay);
    // ...
}
```

If the function must remain synchronous, use `Atomics.wait`:

```ts
const sab = new SharedArrayBuffer(4);
const arr = new Int32Array(sab);
Atomics.wait(arr, 0, 0, delay); // blocks the thread without spinning
```

How can I resolve this? If you propose a fix, please make it concise.

cli/src/execution/sequential.ts
Fatal error fast-path (auth/config failures) silently removed

The previous code had an isFatalError fast-path that immediately aborted all remaining tasks when an authentication or configuration error was detected (e.g. 401, 403, not authenticated, command not found). Both error branches in the old code had this guard.

This PR removes both isFatalError calls entirely. Now, if the API key is invalid or the CLI tool is not installed, every single task will attempt execution, fail, get marked as FAILED, and continue. This can waste significant time (and tokens) compared to the old behavior of aborting on the first fatal error.

If the intent is to defer fatal-error handling to the circuit breaker in retry.ts, that only covers connection errors — not auth/config errors. Consider re-introducing the fast-abort for fatal (non-retryable, non-connection) errors, or at minimum document the intentional behavior change in the PR description.

Prompt To Fix With AI

This is a comment left during a code review.
Path: cli/src/execution/sequential.ts
Line: 3531-3548

Comment:
**Fatal error fast-path (auth/config failures) silently removed**

The previous code had an `isFatalError` fast-path that immediately aborted all remaining tasks when an authentication or configuration error was detected (e.g. `401`, `403`, `not authenticated`, `command not found`). Both error branches in the old code had this guard.

This PR removes both `isFatalError` calls entirely. Now, if the API key is invalid or the CLI tool is not installed, every single task will attempt execution, fail, get marked as `FAILED`, and continue. This can waste significant time (and tokens) compared to the old behavior of aborting on the first fatal error.

If the intent is to defer fatal-error handling to the circuit breaker in `retry.ts`, that only covers connection errors — not auth/config errors. Consider re-introducing the fast-abort for fatal (non-retryable, non-connection) errors, or at minimum document the intentional behavior change in the PR description.

How can I resolve this? If you propose a fix, please make it concise.

cli/src/execution/sequential.ts
getAllTasks() called on every loop iteration — potentially expensive

getAllTasks() is now called inside the main while(true) loop on every task iteration to find the full task object by ID. Depending on the task source implementation, this may involve file I/O or network calls on each iteration. For a source with N tasks, this results in O(N²) calls.

Consider loading the full task list once before the loop (or maintaining a task-by-ID map) and only refreshing it when the task source signals a change:

// Before the while loop
const allTasksCache = await taskSource.getAllTasks();
const taskById = new Map(allTasksCache.map(t => [t.id, t]));

// Inside the loop
const task = taskById.get(pendingTask.id);

Prompt To Fix With AI

This is a comment left during a code review.
Path: cli/src/execution/sequential.ts
Line: 3429-3431

Comment:
**`getAllTasks()` called on every loop iteration — potentially expensive**

`getAllTasks()` is now called inside the main `while(true)` loop on every task iteration to find the full task object by ID. Depending on the task source implementation, this may involve file I/O or network calls on each iteration. For a source with N tasks, this results in O(N²) calls.

Consider loading the full task list once before the loop (or maintaining a task-by-ID map) and only refreshing it when the task source signals a change:

```ts
// Before the while loop
const allTasksCache = await taskSource.getAllTasks();
const taskById = new Map(allTasksCache.map(t => [t.id, t]));

// Inside the loop
const task = taskById.get(pendingTask.id);
```

How can I resolve this? If you propose a fix, please make it concise.

cli/src/execution/retry.ts
maxDelay and jitter options silently dropped — potential behavior regression for callers

The RetryOptions interface previously included maxDelay (default 60 s) and jitter (default true) to bound and randomize retry delays. Both have been removed in this PR.

Any existing callers that relied on a capped retry delay or on jitter to prevent thundering-herd effects will now get the new fixed-delay / exponential-backoff behaviour with no upper bound for non-connection errors (baseDelaySeconds * 1000). The type change also means callers that previously passed maxDelay will now get a TypeScript compile error when they upgrade, which may not be obvious from the PR description.

If these callers exist in the codebase (outside this PR), their option objects will need to be updated. Consider documenting the breaking change in RetryOptions, or re-adding the options as optional no-ops for backward compatibility.

Prompt To Fix With AI

This is a comment left during a code review.
Path: cli/src/execution/retry.ts
Line: 3109-3122

Comment:
**`maxDelay` and `jitter` options silently dropped — potential behavior regression for callers**

The `RetryOptions` interface previously included `maxDelay` (default 60 s) and `jitter` (default `true`) to bound and randomize retry delays. Both have been removed in this PR.

Any existing callers that relied on a capped retry delay or on jitter to prevent thundering-herd effects will now get the new fixed-delay / exponential-backoff behaviour with no upper bound for non-connection errors (`baseDelaySeconds * 1000`). The type change also means callers that previously passed `maxDelay` will now get a TypeScript compile error when they upgrade, which may not be obvious from the PR description.

If these callers exist in the codebase (outside this PR), their option objects will need to be updated. Consider documenting the breaking change in `RetryOptions`, or re-adding the options as optional no-ops for backward compatibility.

How can I resolve this? If you propose a fix, please make it concise.

dosubot · 2026-03-03T23:53:27Z

Related Documentation

3 document(s) may need updating based on files changed in this PR:

Goshen Labs's Space

Model Override and Selection

View Suggested Changes

@@ -38,9 +38,14 @@
 - If a task has a `model:` property in YAML, it overrides any global model specified via CLI flags for that task.
 - If no per-task model is set, the global model override (if any) is used.
 - If neither is set, the engine's default model is used.
+- `--planning-model` (if specified) is used for the planning phase; otherwise, the main model is used.
+- `--test-model` (if specified) enables test orchestration for test-related tasks.
 
 **Planning Model Override:**
 You can use the `--planning-model` flag to specify a separate model for the planning phase (file prediction) while keeping a different model for code generation/execution. This allows you to use a cheaper or faster model for planning and a more capable model for implementation, optimizing both cost and performance.
+
+**Test Model Override:**
+You can use the `--test-model` flag to specify a separate model for test execution and validation. When specified, test-related tasks automatically trigger orchestrator-based execution, where the main model implements the feature, then the test model verifies the implementation and provides feedback. This allows you to use specialized models for testing without affecting main execution.
 
 For example:
 ```bash
@@ -52,6 +57,7 @@
 ralphy --gemini --model gemini-1.0-pro "summarize"     # Use a custom Gemini model
 ralphy --ollama --model glm-4.7 "add feature"          # Use a custom Ollama model
 ralphy --model opus --planning-model haiku "task"      # Use 'opus' for execution, 'haiku' for planning
+ralphy --model sonnet --test-model opus "implement X"  # Use 'sonnet' for main work, 'opus' for testing
 ```
 
 [Reference: README.md](https://github.com/michaelshimeles/ralphy/blob/main/README.md)
@@ -59,16 +65,26 @@
 For more details on model override flow and engine selection, see the sections below.
 
 ### Model Override Flow in the Execution Pipeline
-When you invoke `ralphy` with `--model`, `--planning-model`, or a shortcut flag, the CLI argument parser determines the engine and both model overrides. For example, `--sonnet` is equivalent to `--claude --model sonnet`. The selected engine, model override, and planning model override are stored in the runtime options and passed through all phases of execution, including sequential runs, parallel execution, and conflict resolution.
-
-- The `--model` flag sets the model for code generation and execution.
-- The `--planning-model` flag (optional) sets a separate model for the planning phase (file prediction). If not specified, the main model is used for both phases.
-
-The execution pipeline ensures that both model overrides are consistently propagated. During the planning phase (especially in parallel and no-git modes), the planning model is used to generate file lists for each task. During code generation and execution, the main model is used. Each engine implementation checks for these overrides and appends the appropriate `--model <name>` argument as needed.
-
-This separation allows you to optimize for speed and cost by using a lightweight model for planning and a more capable model for implementation.
-
-[Reference: cli/src/cli/args.ts, cli/src/execution/planning.ts, cli/src/execution/parallel.ts, cli/src/execution/sequential.ts]
+When you invoke `ralphy` with `--model`, `--planning-model`, `--test-model`, or a shortcut flag, the CLI argument parser determines the engine and model overrides. For example, `--sonnet` is equivalent to `--claude --model sonnet`. The selected engine and model overrides are stored in the runtime options and passed through all phases of execution, including sequential runs, parallel execution, and conflict resolution.
+
+The execution pipeline supports three distinct execution phases, each with its own model:
+
+1. **Planning Phase** (uses `--planning-model` if specified, otherwise `--model`): Generates structured analysis and file lists for each task. The planning model is used to predict which files need modification before implementation begins.
+
+2. **Main Execution Phase** (uses `--model`): Implements the actual code changes, performs code generation, and handles primary execution work.
+
+3. **Test Phase** (uses `--test-model` if specified): Automatically triggered for test-related tasks. The test model verifies implementation, runs tests, analyzes results, and provides feedback. If tests fail, the system returns to the main execution phase for fixes, then re-tests.
+
+**Model Selection Decision Tree:**
+- **Planning**: Use `--planning-model` if provided, otherwise use `--model`, otherwise use engine default
+- **Main Execution**: Use `--model` if provided, otherwise use engine default
+- **Testing**: Use `--test-model` if provided, otherwise skip test orchestration
+
+The execution pipeline ensures that all model overrides are consistently propagated. Each engine implementation checks for these overrides and appends the appropriate `--model <name>` argument as needed.
+
+This separation allows you to optimize for different phases: use a lightweight model for planning to reduce cost, a capable model for implementation, and a specialized model for testing—all without affecting the quality of each phase.
+
+[Reference: cli/src/execution/planning.ts, cli/src/execution/orchestrator.ts, cli/src/execution/parallel.ts, cli/src/execution/sequential.ts]
 
 ### Specifying Models for Different Engines
 You can specify a model for any supported engine by combining the engine flag with `--model`. For example, to use a specific model with OpenCode, use `--opencode --model <model-name>`. The same pattern applies to other engines, including Trae, Gemini, Ollama, and Kimi:
@@ -82,7 +98,9 @@
 ralphy --kimi --model kimi-k2.5 "your task"
 ```
 
-You can also specify a separate planning model for any engine using `--planning-model <model-name>`. This model will be used only for the planning phase (file prediction), while the main model (from `--model`) is used for code generation and execution:
+You can also specify a separate planning model for any engine using `--planning-model <model-name>`. This model will be used only for the planning phase (file prediction), while the main model (from `--model`) is used for code generation and execution.
+
+Additionally, you can specify a test model using `--test-model <model-name>` to enable orchestrated test execution. When a test model is specified, test-related tasks automatically trigger a flow where the main model implements, the test model validates, and fixes are applied if needed:
 
 ```bash
 ralphy --model opus --planning-model haiku "task"
@@ -91,11 +109,51 @@
 ralphy --gemini --model gemini-1.0-pro --planning-model gemini-1.0 "summarize"
 ralphy --ollama --model glm-4.7 --planning-model glm-3.5 "add feature"
 ralphy --kimi --model kimi-k2.5 --planning-model kimi-k2.5 "your task"
+
+# With test model orchestration
+ralphy --model sonnet --test-model opus "implement feature with tests"
+ralphy --opencode --model opencode/glm-4.7 --test-model opencode/gpt-5-nano "build API"
 ```
 
 Only one engine/model combination can be specified per command invocation. There is no built-in support for specifying different models for multiple engines in a single command; you must run separate commands for each engine/model pair.
 
-[Reference: cli/src/cli/args.ts, cli/src/execution/planning.ts]
+### Test Model Orchestration
+When you specify a `--test-model`, the execution pipeline can automatically delegate test execution and validation to a separate specialized model. This orchestration is triggered for test-related tasks (tasks containing keywords like "test", "implement", "create", "build", "fix", or "debug").
+
+**How Test Model Orchestration Works:**
+
+1. **Automatic Delegation Flow**: The main model implements the feature or fix, then the test model is automatically invoked to verify the implementation by running tests.
+
+2. **shouldUseOrchestrator() Decision**: The system determines whether to use orchestrator mode by checking:
+   - Is a `--test-model` specified?
+   - Does the task title or description contain test-related or implementation keywords?
+   - If both conditions are met, orchestration is enabled.
+
+3. **executeWithOrchestrator() Flow**:
+   - **Step 1 (Main Model)**: Implement the task/feature
+   - **Step 2 (Test Model)**: Run tests and analyze results
+   - **Step 3 (Optional Fix)**: If tests fail, return to main model with test results for fixes, then re-test
+
+**Benefits:**
+- Use a capable model (e.g., Claude Opus) for complex implementation work
+- Use a specialized or faster model (e.g., GPT-4) for test execution and validation
+- Automatic retry flow when tests fail, without manual intervention
+- Separation of concerns: implementation vs. validation
+
+**Example Workflow:**
+```
+User: ralphy --model anthropic/claude-3.5-sonnet --test-model anthropic/claude-3-opus "implement login feature"
+
+System Flow:
+1. Main model (sonnet) implements login feature
+2. Test model (opus) automatically runs tests, finds issues
+3. Main model (sonnet) receives test feedback, applies fixes
+4. Test model (opus) re-validates, confirms tests pass
+```
+
+This orchestration provides a robust way to ensure code quality without requiring the same model to handle both implementation and testing phases.
+
+[Reference: cli/src/execution/orchestrator.ts]
 
 ### Examples
 - Use the default model for Claude:
@@ -159,11 +217,18 @@
   ralphy --ollama --model glm-4.7 --planning-model glm-3.5 "add feature"
   ralphy --kimi --model kimi-k2.5 --planning-model kimi-k2.5 "your task"
   ```
+  - Use a test model for orchestrated testing:
+    ```bash
+    ralphy --model anthropic/claude-3.5-sonnet --test-model anthropic/claude-3-opus "implement feature X"
+    ralphy --model openai/gpt-4 --test-model anthropic/claude-3.5-sonnet --planning-model openai/gpt-4o "complex task"
+    ralphy --opencode --model opencode/glm-4.7 --test-model opencode/gpt-5-nano "build API with tests"
+    ```
 
 ### Best Practices
 - Use shortcut flags (like `--sonnet`) for common engine/model combinations to reduce typing and avoid mistakes.
 - Use `--model` with the appropriate engine flag for custom or less common model selections, including `--kimi` for Kimi Code CLI.
 - Use `--planning-model` to select a cheaper or faster model for the planning phase, especially in parallel or no-git modes where planning is a distinct step. This can reduce cost and speed up planning without affecting code quality.
+- Use `--test-model` to enable automatic test orchestration for implementation tasks. This provides a feedback loop where the test model validates implementation and suggests fixes if tests fail, improving code quality without manual test execution.
 - Ensure you specify only one engine/model pair per command.
 - Model overrides are consistently applied throughout the execution pipeline, including parallel runs and conflict resolution, so you can rely on the selected model being used for all phases of the operation.
 - Refer to the engine documentation or `ralphy --help` for the list of supported models for each engine, including Kimi Code CLI.

[Accept] [Decline]

Progress Spinner Enhancements and Settings Display

View Suggested Changes

@@ -1,5 +1,16 @@
+## Overview
+
+The progress tracking system now operates on multiple layers, providing comprehensive visibility into execution:
+
+1. **Basic Spinner** - Live timer and step updates for sequential execution
+2. **Planning Progress Events** - Structured events during the planning phase (analyzing, planning, files, optimization)
+3. **Orchestrator Progress Callbacks** - Execution phase reporting for test feedback workflows
+4. **Static Agent Display** - Real-time agent execution tracking with task state transitions
+
+Each layer complements the others to provide complete transparency from planning through execution to testing.
+
 ### Timer with Live Updates
-The spinner now features a timer that updates every second, showing the elapsed time since the task began. This is implemented by starting a `setInterval` when the spinner is created, which calls an update method every 1000 milliseconds. The elapsed time is formatted as minutes and seconds (e.g., `2m 15s`) or just seconds if under a minute, and is displayed in dim text next to the spinner's current step and task description. 
+The spinner features a timer that updates every second, showing the elapsed time since the task began. This is implemented by starting a `setInterval` when the spinner is created, which calls an update method every 1000 milliseconds. The elapsed time is formatted as minutes and seconds (e.g., `2m 15s`) or just seconds if under a minute, and is displayed in dim text next to the spinner's current step and task description. 
 
 **Operation Timing Breakdown:**
 When a task completes, the spinner now displays a breakdown of how much time was spent in each major step (such as "Working", "Retry", etc.), surfacing slow paths and making performance bottlenecks visible. This breakdown appears in dim text after the main spinner output, and only includes steps that took at least 1 second. This helps users and developers identify which parts of the process are slow.
@@ -46,3 +57,167 @@
 Progress logging is now non-blocking and batched, ensuring that the CLI remains responsive even during heavy parallel operations.
 
 This transparency helps users understand the current state of the CLI, reduces confusion, and aids in troubleshooting or optimizing task runs. The spinner's integration with the task run commands ensures that feedback is always up-to-date and contextually relevant, from task start through retries to completion.
+
+## Planning Progress Events
+
+The planning phase reports structured progress events through the `PlanningProgressEvent` and `PlanningProgressCallback` types defined in `cli/src/execution/progress-types.ts`.
+
+### Event Structure
+
+Each planning event includes:
+- `taskId`: Identifier for the task being planned
+- `status`: Current planning status - "started", "analyzing", "thinking", "planning", "completed", "error", or custom status strings
+- `timestamp`: Event occurrence time
+- `message`: Optional human-readable description of the current activity
+- `metadata`: Optional additional data (file counts, file lists, planning flags)
+- `reward`: Optional numeric reward/confidence score from the AI model
+
+### Planning Phases
+
+The planning system emits events for distinct phases:
+
+1. **Analyzing** - Reading project structure and files, extracting task keywords, performing semantic chunking to identify relevant files
+2. **Thinking** - Processing the planning request and generating analysis
+3. **Planning** - Creating the structured plan with steps and file lists
+4. **Completed** - Planning finished successfully with file count and step count
+5. **Error/Failed** - Planning encountered issues requiring retry or manual intervention
+
+### Integration with UI
+
+UI components can subscribe to planning progress by providing a callback:
+
+```typescript
+const onProgress: PlanningProgressCallback = (event) => {
+  // Display event.status and event.message
+  // Show event.metadata.fileCount and event.metadata.files if available
+  // Use event.reward for confidence indicators
+};
+
+await planTaskFiles(engine, task, workDir, {
+  onProgress,
+  // ... other options
+});
+```
+
+The planning system automatically detects streaming vs. non-streaming execution and emits appropriate events. For streaming execution, events are emitted in real-time as the AI model processes each phase. For non-streaming execution, events are emitted at key transition points.
+
+[See implementation](https://github.com/michaelshimeles/ralphy/blob/main/cli/src/execution/progress-types.ts)
+
+## Orchestrator Progress Callbacks
+
+The orchestrator provides execution phase tracking for test-feedback workflows through a progress callback mechanism defined in `cli/src/execution/orchestrator.ts`.
+
+### Callback Interface
+
+The orchestrator accepts an optional `onProgress?: (step: string) => void` callback that receives string descriptions of execution phases:
+
+- **"Starting execution with test feedback"** - Orchestrator workflow beginning
+- **"Running main model..."** - Initial implementation phase starting
+- **"Main model complete, running tests..."** - Implementation finished, transitioning to test phase
+- **"Sending to test model (model-name)..."** - Test execution starting with specific model
+- **"Test prompt ready, executing test model..."** - Test prompt constructed and ready
+- **"Test model complete. Response length: N chars"** - Test phase finished
+- **"Test output preview: ..."** - Sample of test results
+- **"Issues found, requesting fixes..."** - Tests failed, requesting fixes from main model
+- **"All tests passed"** - Successful completion
+
+### Execution Flow
+
+The orchestrator coordinates three stages:
+
+1. **Main Model Execution** - Implements the task based on the original prompt
+2. **Test Model Execution** - Runs tests against the implementation and reports results
+3. **Fix Iteration** (if needed) - Requests fixes from main model if tests reveal issues
+
+Each stage transition triggers a progress callback, allowing the UI to update the displayed phase and activity.
+
+### Integration with Spinner
+
+The orchestrator progress callbacks integrate seamlessly with the existing spinner system. When an orchestrator callback fires, it can update the spinner's step label to reflect the current phase:
+
+```typescript
+await executeWithOrchestrator(prompt, options, (step) => {
+  spinner.updateStep(step);
+});
+```
+
+The orchestrator also updates the `StaticAgentDisplay` to show which model is currently running (main vs. test) and which execution phase is active (execution vs. testing).
+
+[See implementation](https://github.com/michaelshimeles/ralphy/blob/main/cli/src/execution/orchestrator.ts)
+
+## Static Agent Display
+
+The `StaticAgentDisplay` component (`cli/src/ui/static-agent-display.ts`) provides a dedicated UI for tracking agent execution across all phases, with support for parallel agent workflows.
+
+### Task State Transitions
+
+The display tracks tasks through these states:
+
+- **pending** - Task is queued but not yet started
+- **planning** - Task is in the planning phase (analyzing codebase, identifying files)
+- **working** - Task is actively being executed by an agent
+- **completed** - Task finished successfully (shown with ✓)
+- **failed** - Task encountered an error (shown with ✗)
+- **deferred** - Task was deferred for later execution
+- **skipped** - Task was skipped due to dry-run or other conditions
+
+### Execution Phases
+
+The display shows high-level execution phases:
+
+- **PLANNING** - Analyzing the codebase and planning file changes (shown in cyan)
+- **EXECUTION** - Implementing the task by modifying files (shown in magenta)
+- **TESTING** - Running tests and verifying implementation (shown in yellow)
+
+### Display Layout
+
+The static agent display renders:
+
+1. **Workflow Bar** - Visual progress indicator showing current phase (PLANNING → EXECUTION → TESTING)
+2. **Agent Status Lines** - One line per agent showing:
+   - Status indicator (●/✓/✗)
+   - Agent number
+   - Current phase tag
+   - Task title
+   - Active model name
+   - Elapsed time
+3. **Recent Steps** - Up to 5 recent actions per agent, numbered and color-coded by action type:
+   - Read/Search actions (blue)
+   - Write/Edit actions (magenta)
+   - Tool/Execute actions (yellow)
+   - Test/Build actions (green)
+   - Fix/Debug actions (red)
+
+### Integration with Orchestrator
+
+The static display integrates with the orchestrator and planning systems:
+
+- Planning phase events update the display to show "planning" status
+- Orchestrator callbacks update the display to show which model is running (main/planning/test)
+- Phase transitions are reflected in the workflow bar
+- Recent steps are populated from OpenCode streaming output or explicit step updates
+
+### Example Display Output
+
+```
+     PLANNING → ▓▓▓ EXECUTION ▓▓▓ →   TESTING  
+─────────────────── AGENTS ────────────────────
+
+● Agent 1 [EXECUTION] Implement auth flow [main] 0m 42s
+   1. Write: src/auth/login.ts
+   2. Edit: src/routes.ts
+   3. Tool: bash: npm test
+   4. Read: package.json
+   5.
+
+✓ Agent 2 [TESTING] Fix validation bug [test] 1m 15s
+   1. Test: npm test
+   2. Read: src/validation.ts
+   3. Edit: src/validation.ts
+   4. Test: npm test
+   5. Tool: git commit
+
+Press Ctrl+C to stop
+```
+
+[See implementation](https://github.com/michaelshimeles/ralphy/blob/main/cli/src/ui/static-agent-display.ts)

[Accept] [Decline]

Real-Time Streaming Output Handling

View Suggested Changes

@@ -9,6 +9,15 @@
 - **Linting**: For commands invoking lint tools (e.g., "eslint", "prettier").
 - **Committing**: For commands or descriptions containing "git commit".
 - **Staging**: For commands or descriptions containing "git add".
+
+#### Planning Phase Steps
+In addition to execution steps, the system recognizes structured planning phase markers from the planning prompt:
+- **ANALYSIS**: Analyzing the problem, complexity, risks, and goal
+- **PLAN**: Creating implementation steps and strategy
+- **FILES**: Identifying files that need modification
+- **OPTIMIZATION**: Determining efficient approaches and shortcuts
+
+These planning steps are detected when the AI output contains the corresponding XML-style tags (`<ANALYSIS>`, `<PLAN>`, `<FILES>`, `<OPTIMIZATION>`) defined in the planning prompt. During streaming execution, planning progress events can report these as status values to provide visibility into the planning process before implementation begins.
 
 Example logic:
 ```ts
@@ -128,3 +137,162 @@
 ```
 
 This architecture ensures that users receive immediate, meaningful feedback about the current phase of each task as it executes, improving transparency and responsiveness in the development workflow.
+
+## Task State Transitions
+The execution system maintains explicit task states through a centralized state machine implemented in `task-state.ts`. This provides a single source of truth for task lifecycle across all execution modes (sequential, parallel).
+
+### State Machine
+Tasks transition through six states:
+- **pending**: Task is ready to be executed
+- **running**: Task is currently being executed
+- **completed**: Task finished successfully
+- **failed**: Task failed and cannot be retried
+- **deferred**: Task failed with a retryable error and will be retried
+- **skipped**: Task exceeded maximum retry attempts or was explicitly skipped
+
+### State Transitions
+State transitions are atomic and persisted to disk in the same format as the task source (YAML, JSON, CSV, or Markdown):
+
+```ts
+// Claim a task for execution (atomic transition from pending → running)
+const claimed = await taskStateManager.claimTaskForExecution(taskId);
+
+// Transition to completed/failed/deferred/skipped
+await taskStateManager.transitionState(taskId, TaskState.COMPLETED);
+await taskStateManager.transitionState(taskId, TaskState.FAILED, errorMessage);
+await taskStateManager.transitionState(taskId, TaskState.DEFERRED, errorMessage);
+```
+
+### Impact on Progress Tracking
+State transitions affect real-time output in several ways:
+- **Interrupted tasks**: Tasks in `RUNNING` state are automatically reset to `PENDING` on program restart
+- **Retry attempts**: The state manager tracks `attemptCount` per task, preventing infinite retry loops
+- **Parallel execution**: The atomic claim mechanism prevents race conditions where multiple workers try to execute the same task
+- **Progress reporting**: State changes trigger updates to the progress UI, spinner, and agent display
+
+The state manager ensures consistent task lifecycle behavior across both sequential and parallel execution modes. [Source](https://github.com/michaelshimeles/ralphy/pull/162/files#diff-task-state.ts)
+
+## Orchestrator Execution Flow
+The orchestrator pattern in `orchestrator.ts` provides a higher-level execution flow that coordinates between main models and test models. This affects the sequence of steps visible in the streaming output.
+
+### Orchestrated Execution Phases
+When orchestrator mode is enabled (via the `--test-model` flag), tasks follow this execution pattern:
+
+1. **"Starting execution with test feedback"**: Initial setup phase
+2. **"Running main model..."**: Main model implements the task
+3. **"Main model complete, running tests..."**: Transition to test phase
+4. **"Sending to test model (MODEL_NAME)..."**: Test model invoked
+5. **"Test model complete."**: Test results received
+6. **"Issues found, requesting fixes..."**: (If tests fail) Re-invoking main model with test feedback
+7. **Completion or error state**
+
+### Test Model Integration
+The orchestrator uses separate progress reporting for test phases:
+```ts
+// Update agent display to show test model is running
+const display = StaticAgentDisplay.getInstance();
+if (display && agentNum) {
+  display.setAgentStatus(agentNum, "", "working", "testing", testModel);
+}
+```
+
+### Impact on Streaming Output
+During orchestrated execution:
+- The streaming output includes orchestrator status messages in addition to model-generated steps
+- Progress spinners update to show "testing" phase when test model is active
+- Test results are appended to the main output response for visibility
+- Multiple model calls (main → test → main) produce a longer sequence of progress events
+
+This orchestration is transparent to the user but provides more comprehensive test coverage. The orchestrator decides whether to use this pattern based on the `testModel` setting and task characteristics. [Source](https://github.com/michaelshimeles/ralphy/pull/162/files#diff-orchestrator.ts)
+
+## Progress Types and Events
+The `progress-types.ts` module defines structured progress event contracts that extend the original step detection system. These types enable more granular progress reporting beyond the heuristic JSON step detection.
+
+### Planning Progress Events
+The `PlanningProgressEvent` type tracks planning phase progress:
+
+```ts
+export interface PlanningProgressEvent {
+  taskId: string;
+  status: "started" | "thinking" | "completed" | "error" | string;
+  timestamp: number;
+  message?: string;
+  metadata?: Record<string, unknown>;
+  reward?: number;
+}
+```
+
+Status values include:
+- **started**: Planning phase initiated
+- **thinking**: AI is processing the planning request
+- **analyzing**: Reading and exploring the codebase
+- **planning**: Generating the implementation plan
+- **completed**: Planning phase finished successfully
+- **failed**: Planning encountered an error
+
+### Agent Progress Tracking
+The `AgentProgress` interface tracks parallel agent execution state:
+
+```ts
+export interface AgentProgress {
+  agentNum: number;
+  taskTitle: string;
+  status: "planning" | "working" | "completed" | "failed";
+  phase?: ExecutionPhase; // "planning" | "execution" | "testing"
+  modelName?: string; // "main", "planning", "test"
+  currentStep?: string;
+  recentSteps?: string[];
+  startTime: number;
+}
+```
+
+### Integration with Streaming Output
+During streaming execution, these progress events complement the original step detection:
+
+1. **Planning phase**: The `planTaskFiles` function emits `PlanningProgressEvent` callbacks during plan generation
+2. **Execution phase**: The `AgentProgress` updates track which model is active and what step it's on
+3. **Test phase**: When orchestrator is enabled, events include `phase: "testing"` and `modelName: "test"`
+
+These structured events provide the data for the parallel progress UI and static agent display, enabling real-time dashboard-like visibility into all running tasks. [Source](https://github.com/michaelshimeles/ralphy/pull/162/files#diff-progress-types.ts)
+
+## Static Agent Display Integration
+The `static-agent-display.ts` component provides a real-time, in-terminal UI for tracking parallel agent execution. This complements the spinner-based progress during sequential execution with a dashboard-like view during parallel execution.
+
+### Visual Representation
+The static agent display renders:
+- **Workflow bar**: Shows current execution phase (planning → execution → testing) with progress indicators
+- **Agent cards**: One card per parallel agent, showing:
+  - Agent number and task title
+  - Current phase tag (`[PLANNING]`, `[EXECUTION]`, `[TESTING]`)
+  - Model tag (e.g., `[main]`, `[planning]`, `[test]`)
+  - Status indicator (● for working, ✓ for completed, ✗ for failed)
+  - Elapsed time
+- **Step history**: Last 5 steps for each agent, numbered 1-5, with color-coded actions
+
+### Integration with Task States
+The static agent display uses task state transitions to update agent status:
+
+```ts
+// When a task is claimed for execution
+display.setAgentStatus(agentNum, taskTitle, "working", "execution", "main");
+
+// When test model runs
+display.setAgentStatus(agentNum, taskTitle, "working", "testing", testModel);
+
+// When task completes
+display.setAgentStatus(agentNum, taskTitle, "completed", "execution", "main");
+```
+
+### Real-Time Step Updates
+The display updates agent steps from two sources:
+1. **Heuristic step detection**: Via `updateAgent(agentNum, step)` with detected steps from JSON output
+2. **OpenCode streaming**: Via `updateAgentFromOpenCode(agentNum, jsonLine)` which parses OpenCode's structured JSON output (tool_use, text, step_finish events)
+
+### Display Rendering
+The display:
+- Refreshes every second to show elapsed time updates
+- Updates in-place using ANSI escape codes to minimize terminal flicker
+- Shows empty numbered steps (1-5) when agents have no recent activity
+- Clears and re-renders the entire terminal on each update for a clean dashboard view
+
+This display significantly improves observability during parallel execution by showing all agents' progress simultaneously, making it easy to identify slow agents, stuck tasks, or errors. [Source](https://github.com/michaelshimeles/ralphy/pull/162/files#diff-static-agent-display.ts)

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

^{How did I do? Any feedback?}

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/locking.ts

+	if (locks.size > LOCK_MAX_LOCKS) {
+		logWarn(`Lock registry size (${locks.size}) exceeded ${LOCK_MAX_LOCKS}. Evicting oldest non-own locks.`);
+
+		const sorted = Array.from(locks.entries()).sort((a, b) => a[1].timestamp - b[1].timestamp);
+
+		// Keep all locks owned by this process, evict oldest of others first
+		const others = sorted.filter(([_path, info]) => info.owner !== lockOwner);
+		const overflow = locks.size - LOCK_MAX_LOCKS;
+		const toEvictOthers = others.slice(0, Math.max(overflow, 0));


🟡 Medium execution/locking.ts:348

cleanupStaleLocks logs a warning about exceeding LOCK_MAX_LOCKS but never evicts any locks. The filter info.owner !== lockOwner (line 354) always produces an empty array because locks only contains entries where owner === lockOwner (set at lines 152 and 211). When the registry is oversized, the warning fires repeatedly but memory usage is never reduced.

- const sorted = Array.from(locks.entries()).sort((a, b) => a[1].timestamp - b[1].timestamp); - - // Keep all locks owned by this process, evict oldest of others first - const others = sorted.filter(([_path, info]) => info.owner !== lockOwner); - const overflow = locks.size - LOCK_MAX_LOCKS; - const toEvictOthers = others.slice(0, Math.max(overflow, 0)); + const sorted = Array.from(locks.entries()).sort((a, b) => a[1].timestamp - b[1].timestamp); + + // Evict oldest locks from this process when over limit + const overflow = locks.size - LOCK_MAX_LOCKS; + const toEvict = sorted.slice(0, Math.max(overflow, 0));

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/locking.ts around lines 348-356: `cleanupStaleLocks` logs a warning about exceeding `LOCK_MAX_LOCKS` but never evicts any locks. The filter `info.owner !== lockOwner` (line 354) always produces an empty array because `locks` only contains entries where `owner === lockOwner` (set at lines 152 and 211). When the registry is oversized, the warning fires repeatedly but memory usage is never reduced. Evidence trail: cli/src/execution/locking.ts lines 17-18 (locks Map and lockOwner definition), lines 149-162 (first locks.set with owner: lockOwner), lines 211-219 (second locks.set only when typedLockInfo.owner === lockOwner), lines 348-359 (cleanup logic with filter info.owner !== lockOwner). git_grep confirmed only two calls to locks.set exist.

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/prompt.ts

+function buildSkillsSection(workDir: string): string {
+	const skillsCsv = getSkillsAsCsv(workDir);
+	if (skillsCsv) {
+		return `## Agent Skills
+This repo includes compressed skill/playbook documentation for token efficiency:
+${skillsCsv}
+
+Before you start coding:
+- Read and follow any relevant skill docs from compressed list above.
+- If your engine supports a \`skill\` tool (e.g. OpenCode), use it to load relevant skills before implementing.
+- If none apply, continue normally.`;
+	}
+
+	const skillRoots = SKILL_DIRECTORIES.map((dir) => join(workDir, dir)).filter(existsSync);
+	if (skillRoots.length > 0) {
+		return `## Agent Skills
+This repo includes skill/playbook docs that describe preferred patterns, workflows, or tooling:
+${skillRoots.map((p) => `- ${p}`).join("\n")}
+
+Before you start coding:
+- Read and follow any relevant skill docs from paths above.
+- If your engine supports a \`skill\` tool (e.g. OpenCode), use it to load relevant skills before implementing.
+- If none apply, continue normally.`;
+	}


🟢 Low execution/prompt.ts:288

When skill directories exist but contain no .md files, getSkillsAsCsv returns empty and buildSkillsSection falls back to listing those same directories as "preferred patterns, workflows, or tooling" paths to read. This prompts the agent to read docs from empty directories, even though getSkillsAsCsv already determined they contain no valid skill files.

- const skillRoots = SKILL_DIRECTORIES.map((dir) => join(workDir, dir)).filter(existsSync); - if (skillRoots.length > 0) { - return `## Agent Skills -This repo includes skill/playbook docs that describe preferred patterns, workflows, or tooling: -${skillRoots.map((p) => `- ${p}`).join("\n")} - -Before you start coding: -- Read and follow any relevant skill docs from paths above. -- If your engine supports a \`skill\` tool (e.g. OpenCode), use it to load relevant skills before implementing. -- If none apply, continue normally.`; - }

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/prompt.ts around lines 288-311: When skill directories exist but contain no `.md` files, `getSkillsAsCsv` returns empty and `buildSkillsSection` falls back to listing those same directories as "preferred patterns, workflows, or tooling" paths to read. This prompts the agent to read docs from empty directories, even though `getSkillsAsCsv` already determined they contain no valid skill files. Evidence trail: cli/src/execution/prompt.ts:21 - `SKILL_DIRECTORIES = [".opencode/skills", ".claude/skills", ".skills"]` cli/src/execution/prompt.ts:288-316 - `buildSkillsSection` function showing the fallback logic cli/src/execution/skill-compress.ts:108-137 - `getSkillsAsCsv` function showing it only processes `.md` files and returns empty string if none found

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/locking.ts

+					const start = Date.now();
+					while (Date.now() - start < delay) {
+						// Minimal work to allow event loop processing
+						if (Date.now() % 10 === 0) {
+						}
+					}
+				}


🟢 Low execution/locking.ts:248

The Node.js branch of the retry backoff uses a while loop that spins the CPU for the entire delay duration, blocking the event loop and preventing any I/O, timers, or signal handlers from running. This makes the CLI unresponsive for up to 3+ seconds during lock retries, contradicting the comment claiming it "yields to the event loop". Consider using an actual yielding sleep like Atomics.wait if available, or at minimum setTimeout with a Promise to avoid blocking.

- } else { - // For Node.js, use a lighter busy-wait with occasional yielding - const start = Date.now(); - while (Date.now() - start < delay) { - // Minimal work to allow event loop processing - if (Date.now() % 10 === 0) { - } - } - }

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/locking.ts around lines 248-254: The Node.js branch of the retry backoff uses a `while` loop that spins the CPU for the entire delay duration, blocking the event loop and preventing any I/O, timers, or signal handlers from running. This makes the CLI unresponsive for up to 3+ seconds during lock retries, contradicting the comment claiming it "yields to the event loop". Consider using an actual yielding sleep like `Atomics.wait` if available, or at minimum `setTimeout` with a Promise to avoid blocking. Evidence trail: cli/src/execution/locking.ts lines 240-253 at REVIEWED_COMMIT. Line 240 shows `Math.min(baseDelay + jitter, 5000)` confirming max 5s delay. Lines 248-253 show the Node.js branch with a synchronous while loop: `while (Date.now() - start < delay) { if (Date.now() % 10 === 0) {} }`. The comments on lines 247-248 claim 'lighter busy-wait with occasional yielding' and 'Minimal work to allow event loop processing' but synchronous while loops in JavaScript cannot yield to the event loop.

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/locking.ts

+	return false;
+}
+
+export function releaseFileLock(filePath: string, workDir: string): void {


🟠 High execution/locking.ts:262

releaseFileLock unconditionally deletes the lock file without checking ownership. If the lock expired and another process acquired it, this will delete that other process's active lock file, breaking mutual exclusion and allowing a third process to enter the critical section.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/locking.ts around line 262: `releaseFileLock` unconditionally deletes the lock file without checking ownership. If the lock expired and another process acquired it, this will delete that other process's active lock file, breaking mutual exclusion and allowing a third process to enter the critical section. Evidence trail: cli/src/execution/locking.ts lines 262-274 (releaseFileLock function showing no ownership check before unlinkSync), lines 9-14 (LockInfo interface with owner field), line 17 (lockOwner definition), lines 150-153 (lock acquisition stores owner in file)

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/orchestrator.ts

+	const hasFailures =
+		/test (fail|error|broken)|failed|failing|✗|❌|assertion|exception/i.test(testOutput) &&
+		!/0 fail|no fail|all pass|✓|✔|passed/i.test(testOutput);
+


🟡 Medium execution/orchestrator.ts:240

Suggestion: Make hasFailures robust: "5 passed, 1 failed" should trigger fixes, while infrastructure errors (e.g., Test execution failed: ...) should not. Parse counts/status and handle infra failures separately before deciding to fix.

- // Check if tests indicate failures that need fixing - const hasFailures = - /test (fail|error|broken)|failed|failing|✗|❌|assertion|exception/i.test(testOutput) && - !/0 fail|no fail|all pass|✓|✔|passed/i.test(testOutput);

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/orchestrator.ts around lines 240-243: Suggestion: Make `hasFailures` robust: "5 passed, 1 failed" should trigger fixes, while infrastructure errors (e.g., `Test execution failed: ...`) should not. Parse counts/status and handle infra failures separately before deciding to fix. Evidence trail: cli/src/execution/orchestrator.ts lines 232-243 at REVIEWED_COMMIT - shows testOutput construction (lines 232-234) and hasFailures regex logic (lines 240-242). Line 234 shows infrastructure errors become 'Test execution failed: ${testResult.error}' which matches the first regex ('failed') but not the exclusion patterns. The exclusion regex on line 242 includes 'passed' which incorrectly excludes mixed results like '5 passed, 1 failed'.

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/progress-types.ts

+
+export interface PlanningProgressEvent {
+	taskId: string;
+	status: "started" | "thinking" | "completed" | "error" | string;


🟢 Low execution/progress-types.ts:35

The status property includes "error" in its type definition, but the implementation in planTaskFiles only ever emits "failed" for error states. Consumers checking for "error" will miss all failure events.

Suggested change

status: "started" | "thinking" | "completed" | "error" | string;

status: "started" | "thinking" | "completed" | "failed" | string;

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/progress-types.ts around line 35: The `status` property includes `"error"` in its type definition, but the implementation in `planTaskFiles` only ever emits `"failed"` for error states. Consumers checking for `"error"` will miss all failure events. Evidence trail: cli/src/execution/progress-types.ts lines 33-40: PlanningProgressEvent interface with `status: "started" | "thinking" | "completed" | "error" | string;` cli/src/execution/planning.ts lines ~377-379: streaming callback sets `status = "failed"` for error conditions cli/src/execution/planning.ts line ~435: emits `status: "failed"` when raw tool use detected cli/src/execution/planning.ts line ~478: emits `status: "failed"` for regular failures git_grep for `status === "error"` returned no matches (no consumers checking for "error") git_grep for `status === "failed"` shows usage in static-agent-display.ts:230

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/planning.ts

+		"yarn.lock",
+	];
+	const fileStates = new Map<string, { mtime: number; size: number; hash: string }>();
+	let changed = !cached;


🟡 Medium execution/planning.ts:49

When a monitored file like package.json is deleted, generateRepoFingerprint returns the stale cached hash instead of detecting the change. The loop skips deleted files without removing them from the cached fileStates, so the changed flag stays false and the stale dirHash is returned. This causes the planning cache to believe the deleted file still exists.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/planning.ts around line 49: When a monitored file like `package.json` is deleted, `generateRepoFingerprint` returns the stale cached hash instead of detecting the change. The loop skips deleted files without removing them from the cached `fileStates`, so the `changed` flag stays false and the stale `dirHash` is returned. This causes the planning cache to believe the deleted file still exists. Evidence trail: cli/src/execution/planning.ts lines 29-107 (REVIEWED_COMMIT): The `generateRepoFingerprint` function at line 29. The loop at lines 48-66 only processes files where `existsSync(filePath)` returns true (line 49). The `changed` flag is only set to `true` when an existing file has different mtime/size (line 60). At lines 85-88, if `!changed && cached`, the stale `cached.dirHash` is returned. There is no code to detect when a file that was previously in `cached.fileStates` no longer exists.

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/planning.ts

+	return dirHash;
+}
+
+export function loadPlanningCache(


🟡 Medium execution/planning.ts:112

loadPlanningCache always loads the compressed file (.gz) when present, even if the uncompressed file is newer. Since savePlanningCache can leave a stale compressed file next to a fresh uncompressed file after a gzip error, this causes stale cache data to be loaded. Consider checking mtime of both files and loading whichever is newer.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/planning.ts around line 112: `loadPlanningCache` always loads the compressed file (`.gz`) when present, even if the uncompressed file is newer. Since `savePlanningCache` can leave a stale compressed file next to a fresh uncompressed file after a gzip error, this causes stale cache data to be loaded. Consider checking `mtime` of both files and loading whichever is newer. Evidence trail: cli/src/execution/planning.ts lines 112-141 (`loadPlanningCache`): checks `existsSync(compressedCacheFile)` at line 117, loads it immediately without mtime check. cli/src/execution/planning.ts lines 143-162 (`savePlanningCache`): catch block at line 159-160 writes uncompressed file but does not delete existing `.gz` file. Commit: REVIEWED_COMMIT

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/prompt.ts

+	return `You are working on a specific task. Focus ONLY on this task:
+
+TASK: ${task}
+
+${sections.join("\n\n")}
+
+${buildProtectedPathsWarning(prdFile, boundaries)}
+Do NOT Read, Glob, or Search inside .ralphy-sandboxes or .ralphy-worktrees.
+Do NOT mark tasks complete - that will be handled separately.
+Focus only on implementing: ${task}`;
 }


🟡 Medium execution/prompt.ts:385

The buildPrompt function outputs a raw list of protected paths at the end of the prompt without a section header or prohibitory instruction. The LLM receives filenames like the PRD file and .ralphy paths with no explicit "Do NOT modify" directive, unlike buildExecutionPrompt which uses a Boundaries - Do NOT modify: header. This omission leaves system files vulnerable to modification because the agent has no clear signal these are immutable boundaries.

return `You are working on a specific task. Focus ONLY on this task: TASK: ${task} ${sections.join("\n\n")} -${buildProtectedPathsWarning(prdFile, boundaries)} -Do NOT Read, Glob, or Search inside .ralphy-sandboxes or .ralphy-worktrees. -Do NOT mark tasks complete - that will be handled separately. +## Boundaries (Do NOT modify these files/directories) +${buildProtectedPathsWarning(prdFile, boundaries)} + +Do NOT Read, Glob, or Search inside .ralphy-sandboxes or .ralphy-worktrees. +Do NOT mark tasks complete - that will be handled separately. Focus only on implementing: ${task}`;

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/prompt.ts around lines 385-395: The `buildPrompt` function outputs a raw list of protected paths at the end of the prompt without a section header or prohibitory instruction. The LLM receives filenames like the PRD file and `.ralphy` paths with no explicit "Do NOT modify" directive, unlike `buildExecutionPrompt` which uses a `Boundaries - Do NOT modify:` header. This omission leaves system files vulnerable to modification because the agent has no clear signal these are immutable boundaries. Evidence trail: cli/src/execution/prompt.ts:348-355 - `buildProtectedPathsWarning` returns raw list with no header: `return [...systemPaths, ...userPaths].join('\n')`; cli/src/execution/prompt.ts:391 - `buildPrompt` uses this raw output; cli/src/execution/prompt.ts:463 - `buildExecutionPrompt` uses proper header `Boundaries - Do NOT modify:\n${allBoundaries.join('\n')}`; cli/src/execution/prompt.ts:15-19 - RALPHY_PROTECTED_PATHS constant definition

macroscopeapp · 2026-03-03T23:58:44Z

cli/src/execution/orchestrator.ts

+		"debug",
+	];
+
+	return testKeywords.some((kw) => combined.includes(kw));


🟢 Low execution/orchestrator.ts:304

The keyword matching in shouldUseOrchestrator uses String.prototype.includes, so the keyword "test" matches substrings inside words like "latest", "fastest", and "contest". This causes the orchestrator to be enabled for tasks that don't actually involve testing. Consider matching whole words with word boundaries (e.g., regex with \b) or splitting on whitespace and checking for exact matches.

- return testKeywords.some((kw) => combined.includes(kw)); + const words = combined.split(/\s+/); + return testKeywords.some((kw) => words.includes(kw));

Also found in 1 other location(s)

cli/src/execution/locking.ts:106

The isInRalphyDir function uses case-sensitive substring matching (String.prototype.includes), which leads to two issues:

Case Sensitivity on Windows: On Windows, file systems are case-insensitive. A path containing .RALPHY (e.g. C:\Project\.RALPHY\data) will return false, failing to identify the directory correctly. This contradicts the logic in normalizePathForLocking which explicitly handles Windows case normalization.

False Positives: The simple substring check will match unrelated files such as my.ralphy.backup.txt or directories like projects/not.ralphy/. This causes the application to incorrectly classify user data as internal system files.

Fix: Normalize the path first (or handle case-insensitivity) and check for specific path segments (e.g. surrounding the search term with path separators) rather than a partial string match.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file cli/src/execution/orchestrator.ts around line 304: The keyword matching in `shouldUseOrchestrator` uses `String.prototype.includes`, so the keyword `"test"` matches substrings inside words like `"latest"`, `"fastest"`, and `"contest"`. This causes the orchestrator to be enabled for tasks that don't actually involve testing. Consider matching whole words with word boundaries (e.g., regex with `\b`) or splitting on whitespace and checking for exact matches. Evidence trail: cli/src/execution/orchestrator.ts lines 280-306 (REVIEWED_COMMIT): Function `shouldUseOrchestrator` at line 304 uses `combined.includes(kw)` where `combined` is the concatenation of taskTitle and taskDescription. The `testKeywords` array at line 290 includes 'test'. JavaScript's `String.prototype.includes()` performs substring matching, so 'latest'.includes('test') === true, 'fastest'.includes('test') === true, 'contest'.includes('test') === true. Also found in 1 other location(s): - cli/src/execution/locking.ts:106 -- The `isInRalphyDir` function uses case-sensitive substring matching (`String.prototype.includes`), which leads to two issues: 1. **Case Sensitivity on Windows**: On Windows, file systems are case-insensitive. A path containing `.RALPHY` (e.g. `C:\Project\.RALPHY\data`) will return `false`, failing to identify the directory correctly. This contradicts the logic in `normalizePathForLocking` which explicitly handles Windows case normalization. 2. **False Positives**: The simple substring check will match unrelated files such as `my.ralphy.backup.txt` or directories like `projects/not.ralphy/`. This causes the application to incorrectly classify user data as internal system files. **Fix**: Normalize the path first (or handle case-insensitivity) and check for specific path segments (e.g. surrounding the search term with path separators) rather than a partial string match.

cli/src/execution/task-state.ts

cli/src/execution/sequential.ts

cli/src/execution/retry.ts

cli/src/execution/prompt.ts

cli/src/execution/skill-compress.ts

cli/src/execution/locking.ts

cli/src/execution/task-state.ts

cli/src/execution/orchestrator.ts

cli/src/ui/static-agent-display.ts

cli/src/execution/sequential.ts

cli/src/execution/locking.ts

cli/src/execution/orchestrator.ts

cli/src/execution/planning.ts

cli/src/execution/locking.ts

cli/src/execution/planning.ts

cli/src/execution/prompt.ts

greptile-apps · 2026-03-04T09:46:57Z

cli/src/utils/errors.ts

+	const retryableMessages = [
+		"timeout",
+		"connection refused",
+		"network",
+		"rate limit",
+		"too many requests",
+		"temporary failure",
+		"try again",
+		"locked",
+		"conflict",
+		"connection error",
+		"unable to connect",
+		"internet connection",
+		"econnrefused",
+		"econnreset",
+		"socket hang up",
+		"fetch failed",
+	];


isRetryableError is missing several patterns from the previous implementation

The retryableMessages list drops these critical patterns present in the old retry.ts version:

Missing pattern Example error Impact

hit your limit "rate_limit_error: You have hit your limit" API rate limit not retried

quota "You exceeded your current quota" Quota errors treated as permanent

429 (bare code) "Error: 429" or just "429" HTTP 429 not retried if message is bare

ENOTFOUND "getaddrinfo ENOTFOUND api.example.com" DNS failures permanently fail (not caught by "timeout")

overloaded "Service Overloaded" Service overload errors not retried

These errors will now be treated as non-retryable, causing tasks to fail permanently on transient DNS, quota, or rate-limit errors.

Add these patterns to the retryableMessages list:

const retryableMessages = [ // ... existing entries ... "hit your limit", "quota", "429", "enotfound", "overloaded", ];

Prompt To Fix With AI

This is a comment left during a code review. Path: cli/src/utils/errors.ts Line: 87-104 Comment: **`isRetryableError` is missing several patterns from the previous implementation** The `retryableMessages` list drops these critical patterns present in the old `retry.ts` version: | Missing pattern | Example error | Impact | |---|---|---| | `hit your limit` | `"rate_limit_error: You have hit your limit"` | API rate limit not retried | | `quota` | `"You exceeded your current quota"` | Quota errors treated as permanent | | `429` (bare code) | `"Error: 429"` or just `"429"` | HTTP 429 not retried if message is bare | | `ENOTFOUND` | `"getaddrinfo ENOTFOUND api.example.com"` | DNS failures permanently fail (not caught by "timeout") | | `overloaded` | `"Service Overloaded"` | Service overload errors not retried | These errors will now be treated as non-retryable, causing tasks to fail permanently on transient DNS, quota, or rate-limit errors. Add these patterns to the `retryableMessages` list: ```ts const retryableMessages = [ // ... existing entries ... "hit your limit", "quota", "429", "enotfound", "overloaded", ]; ``` How can I resolve this? If you propose a fix, please make it concise.

cli/src/execution/orchestrator.ts

cli/src/ui/static-agent-display.ts

cli/src/execution/task-state.ts

greptile-apps · 2026-03-04T18:15:44Z

cli/src/execution/sequential.ts

+		if (dryRun && !options.debugOpenCode) {
 			spinner.success("(dry run) Skipped");
 		} else {


Unreachable dead-code branch

This if (dryRun && !options.debugOpenCode) condition can never evaluate to true inside the main loop. The early-exit block at lines 107–115 already handles the dryRun && !debugOpenCode case and returns before the TaskStateManager is even initialised:

// lines 107-115 — returns unconditionally when dryRun && !debugOpenCode if (dryRun && !options.debugOpenCode) { ... return result; }

By the time execution enters the while loop, the only surviving combination is:

dryRun = false → condition at 224 is trivially false, or

dryRun = true && debugOpenCode = true → !options.debugOpenCode is false, so the condition is still false.

The spinner call spinner.success("(dry run) Skipped") is therefore dead code that will never execute. The branch and spinner should be removed to avoid misleading future maintainers.

Suggested change

if (dryRun && !options.debugOpenCode) {

spinner.success("(dry run) Skipped");

} else {

} else {

Prompt To Fix With AI

This is a comment left during a code review. Path: cli/src/execution/sequential.ts Line: 224-226 Comment: **Unreachable dead-code branch** This `if (dryRun && !options.debugOpenCode)` condition can never evaluate to `true` inside the main loop. The early-exit block at lines 107–115 already handles the `dryRun && !debugOpenCode` case and returns before the `TaskStateManager` is even initialised: ```ts // lines 107-115 — returns unconditionally when dryRun && !debugOpenCode if (dryRun && !options.debugOpenCode) { ... return result; } ``` By the time execution enters the `while` loop, the only surviving combination is: - `dryRun = false` → condition at 224 is trivially `false`, or - `dryRun = true && debugOpenCode = true` → `!options.debugOpenCode` is `false`, so the condition is still `false`. The spinner call `spinner.success("(dry run) Skipped")` is therefore dead code that will never execute. The branch and spinner should be removed to avoid misleading future maintainers. ```suggestion } else { ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-04T18:15:45Z

cli/src/execution/planning.ts

+				/connection|network|timeout|unable to connect|internet connection|econnrefused|econnreset/i.test(
+					error,
+				);
+			const attemptNumber = DEFAULT_MAX_REPLANS - maxReplans + 1;


attemptNumber uses a global constant instead of actual call depth

const attemptNumber = DEFAULT_MAX_REPLANS - maxReplans + 1;

This formula only produces the correct attempt number when the initial maxReplans argument equals DEFAULT_MAX_REPLANS. If planTaskFiles is ever called directly with a smaller value — e.g. planTaskFiles(engine, task, workDir, model, 1) — then on the retry call maxReplans becomes 0, and attemptNumber computes as DEFAULT_MAX_REPLANS - 0 + 1 = DEFAULT_MAX_REPLANS + 1. The exponential backoff Math.min(2000 * 2 ** (attemptNumber - 1), 10000) then yields the maximum delay on the very first retry, introducing an unexpectedly long wait.

Track the attempt independently by passing it down through the recursion, or derive it from a local counter:

// Replace with a local counter that doesn't depend on a global constant: const attempt = maxReplans === 0 ? 1 : DEFAULT_MAX_REPLANS - maxReplans + 1;

Or, cleaner: add an attempt = 1 parameter that increments on each recursive call.

Prompt To Fix With AI

This is a comment left during a code review. Path: cli/src/execution/planning.ts Line: 538 Comment: **`attemptNumber` uses a global constant instead of actual call depth** ```ts const attemptNumber = DEFAULT_MAX_REPLANS - maxReplans + 1; ``` This formula only produces the correct attempt number when the initial `maxReplans` argument equals `DEFAULT_MAX_REPLANS`. If `planTaskFiles` is ever called directly with a smaller value — e.g. `planTaskFiles(engine, task, workDir, model, 1)` — then on the retry call `maxReplans` becomes `0`, and `attemptNumber` computes as `DEFAULT_MAX_REPLANS - 0 + 1 = DEFAULT_MAX_REPLANS + 1`. The exponential backoff `Math.min(2000 * 2 ** (attemptNumber - 1), 10000)` then yields the maximum delay on the very first retry, introducing an unexpectedly long wait. Track the attempt independently by passing it down through the recursion, or derive it from a local counter: ```ts // Replace with a local counter that doesn't depend on a global constant: const attempt = maxReplans === 0 ? 1 : DEFAULT_MAX_REPLANS - maxReplans + 1; ``` Or, cleaner: add an `attempt = 1` parameter that increments on each recursive call. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-04T18:15:46Z

cli/src/execution/task-state.ts

+	async transitionState(
+		taskId: string,
+		newState: TaskState,
+		error?: string,
+		executionContext?: TaskStateEntry["executionContext"],
+	): Promise<void> {
+		const key = this.buildTaskKey(taskId);
+		const task = this.tasks.get(key);
+
+		if (!task) {
+			logError(`Task ${taskId} not found in state manager`);
+			return;
+		}
+
+		const oldState = task.state;
+		task.state = newState;
+
+		if (error) {
+			task.errorHistory.push(error);
+		}
+
+		if (executionContext) {
+			task.executionContext = { ...task.executionContext, ...executionContext };
+		}
+
+		await this.persistState();
+		logDebug(`Task ${taskId} transitioned from ${oldState} to ${newState}`);
+	}


transitionState writes state to disk without acquiring a file lock

claimTaskForExecution correctly acquires a file lock before loading, mutating, and persisting the state file — preventing parallel workers from double-claiming a task. transitionState, however, operates only on the in-memory tasks Map and writes directly to disk with no lock and no fresh reload:

async transitionState(...): Promise<void> { const task = this.tasks.get(key); // potentially stale in-memory snapshot task.state = newState; await this.persistState(); // blind overwrite — no lock, no reload }

In parallel execution where multiple workers share a single state file, two agents completing their tasks concurrently will each hold their own (now-diverged) snapshots of the full state file. When both call persistState, the second writer overwrites the first writer's completed-task record, silently discarding a COMPLETED or FAILED transition.

The fix is to apply the same lock / reload / mutate / persist / release pattern already used by claimTaskForExecution to transitionState.

Prompt To Fix With AI

This is a comment left during a code review. Path: cli/src/execution/task-state.ts Line: 188-215 Comment: **`transitionState` writes state to disk without acquiring a file lock** `claimTaskForExecution` correctly acquires a file lock before loading, mutating, and persisting the state file — preventing parallel workers from double-claiming a task. `transitionState`, however, operates only on the in-memory `tasks` Map and writes directly to disk with no lock and no fresh reload: ```ts async transitionState(...): Promise<void> { const task = this.tasks.get(key); // potentially stale in-memory snapshot task.state = newState; await this.persistState(); // blind overwrite — no lock, no reload } ``` In parallel execution where multiple workers share a single state file, two agents completing their tasks concurrently will each hold their own (now-diverged) snapshots of the full state file. When both call `persistState`, the second writer overwrites the first writer's completed-task record, silently discarding a `COMPLETED` or `FAILED` transition. The fix is to apply the same lock / reload / mutate / persist / release pattern already used by `claimTaskForExecution` to `transitionState`. How can I resolve this? If you propose a fix, please make it concise.

VX1D added 2 commits March 4, 2026 00:23

feat: add execution state orchestration and retry flow

77314d6

docs: update README for branch capabilities

34b32b2

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

macroscopeapp bot reviewed Mar 3, 2026

View reviewed changes

fix: address execution-state regressions from review

5e56fe0