Skip to content

Commit d0a1005

Browse files
committed
feat(ralph-external): add comprehensive output capture for long-running sessions
Implemented complete capture system for 6-8 hour Claude sessions: **New Components:** - SnapshotManager (640 lines, 23 tests): Pre/post session state capture - Git status snapshots (branch, commits, staged/unstaged/untracked) - .aiwg directory state and file hashes - Key file tracking (package.json, CLAUDE.md, etc.) - Snapshot diff calculation with categorized changes - CheckpointManager (495 lines, 22 tests): Periodic state checkpoints - Default 30-minute intervals during active sessions - Git status, .aiwg state, memory usage tracking - Cumulative file change tracking across checkpoints - Crash recovery support - StateAssessor (775 lines): Two-phase intelligent assessment - Phase 1 (Orient): Understand what happened in session - Phase 2 (Generate): Create context-aware continuation prompts - File change categorization, test status analysis - Progress estimation, blocker detection - Optional Claude-powered enhancement **Orchestrator Integration:** - Added snapshot capture before/after each session - Integrated checkpoint manager with configurable intervals - State assessor for intelligent prompt generation - Enhanced iteration records with all capture artifacts **Session Launcher Enhancements:** - Session transcript capture from ~/.claude/projects/ - Stream-JSON event parsing (tool calls, errors, completions) - Verbose output mode support - Enhanced artifact management **Configuration Options:** - --verbose: Enable verbose Claude output for debugging - --checkpoint-interval N: Checkpoint interval in minutes (default: 30) - --no-snapshots: Disable pre/post session snapshots - --no-checkpoints: Disable periodic checkpoints - --use-claude-assessment: Use Claude for state assessment - --key-files: Comma-separated key files to track **Documentation:** - Comprehensive README for ralph-external module - Detailed docs for snapshot-manager, checkpoint-manager, state-assessor - Updated /ralph-external command with new options - API references and usage examples **Tests:** - 23 tests for snapshot-manager (100% coverage) - 22 tests for checkpoint-manager - Enhanced session-launcher tests - All 166 tests passing This enables crash recovery, progress tracking, and intelligent continuation for long-running Claude sessions that exhaust context.
1 parent d2f0b47 commit d0a1005

15 files changed

Lines changed: 5410 additions & 27 deletions

.claude/commands/ralph-external.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
---
2+
description: Start an external Ralph loop for crash-resilient iterative task execution
3+
category: sdlc-orchestration
4+
argument-hint: "<objective>" --completion "<criteria>" [--max-iterations N] [--verbose] [--checkpoint-interval M]
5+
allowed-tools: Bash, Read, Write
6+
model: opus
7+
---
8+
9+
# /ralph-external
10+
11+
Start an **External Ralph Loop** - a supervisor that wraps Claude Code sessions to provide crash recovery, cross-session persistence, and comprehensive state capture for long-running sessions (6-8 hours).
12+
13+
## Arguments
14+
15+
| Argument | Type | Required | Description |
16+
|----------|------|----------|-------------|
17+
| `<objective>` | string | Yes | Task objective |
18+
| `--completion` | string | Yes | Verifiable completion criteria |
19+
| `--max-iterations` | number | No | Max external iterations (default: 5) |
20+
| `--model` | string | No | Claude model (default: opus) |
21+
| `--budget` | number | No | Budget per iteration USD (default: 2.0) |
22+
| `--timeout` | number | No | Timeout per iteration minutes (default: 60) |
23+
| `--verbose` | flag | No | Enable verbose Claude output for debugging |
24+
| `--checkpoint-interval` | number | No | Checkpoint interval minutes (default: 30) |
25+
| `--no-snapshots` | flag | No | Disable pre/post session snapshots |
26+
| `--no-checkpoints` | flag | No | Disable periodic checkpoints |
27+
| `--use-claude-assessment` | flag | No | Use Claude for state assessment |
28+
| `--key-files` | string | No | Comma-separated key files to track |
29+
| `--gitea-issue` | flag | No | Create Gitea issue for tracking |
30+
31+
## When to Use
32+
33+
Use External Ralph when:
34+
- Task may take longer than a single session
35+
- Context corruption is a risk
36+
- You need crash recovery
37+
- Progress tracking across sessions is important
38+
39+
Use Internal Ralph (`/ralph`) for:
40+
- Tasks that fit within a single session
41+
- Fast iteration cycles
42+
- Simple verification criteria
43+
44+
## Workflow
45+
46+
Each iteration follows a comprehensive capture flow:
47+
48+
1. **Pre-Session Snapshot** - Captures git status, .aiwg state, file hashes
49+
2. **Prompt Generation** - Context-aware prompt with learnings and progress
50+
3. **Checkpoint Manager Start** - Begins periodic state snapshots
51+
4. **Session Launch** - Spawns Claude with stdout/stderr/transcript capture
52+
5. **Checkpoint Manager Stop** - Final checkpoint summary
53+
6. **Post-Session Snapshot** - Captures changes, calculates diff
54+
7. **Output Analysis** - Determines completion/continuation
55+
8. **State Update** - Records all capture artifacts
56+
57+
## Capture Features
58+
59+
| Feature | Default | Description |
60+
|---------|---------|-------------|
61+
| Pre/Post Snapshots | Enabled | Git and .aiwg state before/after session |
62+
| Periodic Checkpoints | Enabled | State snapshots every 30 min during session |
63+
| Session Transcript | Always | Claude transcript from ~/.claude/projects/ |
64+
| Stream-JSON Parsing | Always | Tool calls, errors, completions extracted |
65+
| Verbose Output | Disabled | Enable with --verbose for debugging |
66+
67+
## Examples
68+
69+
```bash
70+
# Simple task
71+
/ralph-external "Fix all failing tests" --completion "npm test passes"
72+
73+
# With enhanced capture
74+
/ralph-external "Implement user authentication" \
75+
--completion "npm test -- --testPathPattern=auth passes" \
76+
--max-iterations 10 \
77+
--verbose \
78+
--checkpoint-interval 15
79+
80+
# Long-running migration (6-8 hours)
81+
/ralph-external "Migrate codebase to TypeScript" \
82+
--completion "npx tsc --noEmit exits 0" \
83+
--max-iterations 20 \
84+
--budget 5.0 \
85+
--checkpoint-interval 20 \
86+
--key-files "package.json,tsconfig.json,CLAUDE.md"
87+
88+
# With Claude-powered assessment
89+
/ralph-external "Complex refactoring task" \
90+
--completion "npm test && npm run lint" \
91+
--max-iterations 15 \
92+
--use-claude-assessment \
93+
--gitea-issue
94+
95+
# Minimal capture (faster)
96+
/ralph-external "Quick fix" \
97+
--completion "npm test passes" \
98+
--no-checkpoints
99+
```
100+
101+
## State Directory
102+
103+
```
104+
.aiwg/ralph-external/
105+
├── session-state.json # Active loop state
106+
├── iterations/
107+
│ └── 001/
108+
│ ├── prompt.md # Prompt used
109+
│ ├── stdout.log # Captured stdout
110+
│ ├── stderr.log # Captured stderr
111+
│ ├── pre-snapshot.json # State before session
112+
│ ├── post-snapshot.json # State after session
113+
│ ├── snapshot-diff.json # Changes detected
114+
│ ├── analysis.json # Output analysis
115+
│ ├── state-assessment.json # Two-phase assessment
116+
│ ├── session-transcript.jsonl # Claude transcript
117+
│ ├── parsed-events.json # Stream-JSON events
118+
│ └── checkpoints/
119+
│ ├── 001-checkpoint.json
120+
│ ├── 002-checkpoint.json
121+
│ └── ...
122+
├── prompts/ # All generated prompts
123+
├── analysis/ # All analysis results
124+
└── completion-report.md # Final summary
125+
```
126+
127+
## Natural Language Triggers
128+
129+
- "Start external ralph loop for..."
130+
- "Run crash-resilient loop to..."
131+
- "Execute long-running task..."
132+
133+
## References
134+
135+
- @tools/ralph-external/orchestrator.mjs - Main loop logic
136+
- @tools/ralph-external/index.mjs - CLI entry point
137+
- @tools/ralph-external/snapshot-manager.mjs - Pre/post session snapshots
138+
- @tools/ralph-external/checkpoint-manager.mjs - Periodic checkpoints
139+
- @tools/ralph-external/state-assessor.mjs - Two-phase assessment
140+
- @tools/ralph-external/session-launcher.mjs - Claude CLI wrapper
141+
- @.claude/agents/ralph-output-analyzer.md - Output analyzer

test/unit/ralph-external/session-launcher.test.ts

Lines changed: 207 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,28 @@
44
* @source @tools/ralph-external/session-launcher.mjs
55
*/
66

7-
import { describe, it, expect, beforeEach, vi } from 'vitest';
7+
import { describe, it, expect, beforeEach, vi, afterEach } from 'vitest';
8+
import { mkdirSync, writeFileSync, rmSync, existsSync } from 'fs';
9+
import { join } from 'path';
810

911
// Import the module under test
1012
// @ts-ignore - ESM import
1113
import { SessionLauncher } from '../../../tools/ralph-external/session-launcher.mjs';
1214

1315
describe('SessionLauncher', () => {
1416
let launcher: InstanceType<typeof SessionLauncher>;
17+
let testDir: string;
1518

1619
beforeEach(() => {
1720
launcher = new SessionLauncher();
21+
testDir = join('/tmp', `ralph-test-${Date.now()}`);
22+
mkdirSync(testDir, { recursive: true });
23+
});
24+
25+
afterEach(() => {
26+
if (existsSync(testDir)) {
27+
rmSync(testDir, { recursive: true, force: true });
28+
}
1829
});
1930

2031
describe('constructor', () => {
@@ -36,6 +47,7 @@ describe('SessionLauncher', () => {
3647
workingDir: '/project',
3748
stdoutPath: '/tmp/stdout.log',
3849
stderrPath: '/tmp/stderr.log',
50+
outputDir: '/tmp/output',
3951
};
4052

4153
it('should include required flags', () => {
@@ -54,6 +66,35 @@ describe('SessionLauncher', () => {
5466
expect(args[args.length - 1]).toBe('Fix the bug');
5567
});
5668

69+
it('should include verbose flag when specified', () => {
70+
const args = launcher.buildArgs({
71+
...baseOptions,
72+
verbose: true,
73+
});
74+
75+
expect(args).toContain('--verbose');
76+
});
77+
78+
it('should not include verbose flag when false', () => {
79+
const args = launcher.buildArgs({
80+
...baseOptions,
81+
verbose: false,
82+
});
83+
84+
expect(args).not.toContain('--verbose');
85+
});
86+
87+
it('should include max-turns when specified', () => {
88+
const args = launcher.buildArgs({
89+
...baseOptions,
90+
maxTurns: 10,
91+
});
92+
93+
const turnsIndex = args.indexOf('--max-turns');
94+
expect(turnsIndex).toBeGreaterThan(-1);
95+
expect(args[turnsIndex + 1]).toBe('10');
96+
});
97+
5798
it('should include model when specified', () => {
5899
const args = launcher.buildArgs({
59100
...baseOptions,
@@ -117,6 +158,171 @@ describe('SessionLauncher', () => {
117158
expect(args).not.toContain('--max-budget-usd');
118159
expect(args).not.toContain('--mcp-config');
119160
expect(args).not.toContain('--append-system-prompt');
161+
expect(args).not.toContain('--max-turns');
162+
expect(args).not.toContain('--verbose');
163+
});
164+
});
165+
166+
describe('parseStreamEvents', () => {
167+
it('should parse valid stream-json events', async () => {
168+
const stdoutPath = join(testDir, 'stdout.log');
169+
170+
// Create mock stream-json output
171+
const mockEvents = [
172+
{ type: 'message_start', message: { id: 'msg_1' } },
173+
{ type: 'content_block_start', index: 0 },
174+
{ type: 'content_block_delta', delta: { type: 'text_delta', text: 'Hello' } },
175+
{ name: 'tool_read_file', type: 'tool_use', id: 'tool_1' },
176+
{ error: 'File not found', code: 'not_found' },
177+
{ type: 'message_stop' },
178+
];
179+
180+
writeFileSync(stdoutPath, mockEvents.map(e => JSON.stringify(e)).join('\n'));
181+
182+
const { path, stats } = await launcher.parseStreamEvents(stdoutPath, testDir);
183+
184+
expect(path).toBe(join(testDir, 'parsed-events.json'));
185+
expect(stats.totalEvents).toBe(6);
186+
expect(stats.toolCallCount).toBe(1);
187+
expect(stats.errorCount).toBe(1);
188+
});
189+
190+
it('should handle empty stdout file', async () => {
191+
const stdoutPath = join(testDir, 'stdout-empty.log');
192+
writeFileSync(stdoutPath, '');
193+
194+
const { path, stats } = await launcher.parseStreamEvents(stdoutPath, testDir);
195+
196+
expect(path).toBe(join(testDir, 'parsed-events.json'));
197+
expect(stats.totalEvents).toBe(0);
198+
expect(stats.toolCallCount).toBe(0);
199+
expect(stats.errorCount).toBe(0);
200+
});
201+
202+
it('should skip malformed JSON lines', async () => {
203+
const stdoutPath = join(testDir, 'stdout-malformed.log');
204+
205+
const content = [
206+
'{"type": "valid"}',
207+
'not json',
208+
'{"type": "also_valid"}',
209+
'{incomplete',
210+
].join('\n');
211+
212+
writeFileSync(stdoutPath, content);
213+
214+
const { path, stats } = await launcher.parseStreamEvents(stdoutPath, testDir);
215+
216+
expect(path).toBe(join(testDir, 'parsed-events.json'));
217+
expect(stats.totalEvents).toBe(2); // Only valid lines counted
218+
});
219+
220+
it('should categorize different event types', async () => {
221+
const stdoutPath = join(testDir, 'stdout-types.log');
222+
223+
const mockEvents = [
224+
{ type: 'message_start' },
225+
{ type: 'content_block_start' },
226+
{ type: 'content_block_delta', delta: {} },
227+
{ type: 'content_block_stop' },
228+
{ type: 'message_stop' },
229+
{ tool_use: true, name: 'test_tool' },
230+
{ error: 'Test error' },
231+
];
232+
233+
writeFileSync(stdoutPath, mockEvents.map(e => JSON.stringify(e)).join('\n'));
234+
235+
const { path, stats } = await launcher.parseStreamEvents(stdoutPath, testDir);
236+
237+
expect(stats.totalEvents).toBe(7);
238+
expect(stats.toolCallCount).toBeGreaterThan(0);
239+
expect(stats.errorCount).toBeGreaterThan(0);
240+
});
241+
});
242+
243+
describe('_categorizeStreamEvent', () => {
244+
it('should categorize by type field', () => {
245+
expect(launcher._categorizeStreamEvent({ type: 'message_start' })).toBe('message_start');
246+
expect(launcher._categorizeStreamEvent({ type: 'error' })).toBe('error');
247+
});
248+
249+
it('should detect tool calls', () => {
250+
expect(launcher._categorizeStreamEvent({ tool: 'read' })).toBe('tool_call');
251+
expect(launcher._categorizeStreamEvent({ tool_use: true })).toBe('tool_call');
252+
expect(launcher._categorizeStreamEvent({ name: 'tool_something' })).toBe('tool_call');
253+
});
254+
255+
it('should detect errors', () => {
256+
expect(launcher._categorizeStreamEvent({ error: 'failure' })).toBe('error');
257+
expect(launcher._categorizeStreamEvent({ message: 'error occurred' })).toBe('error');
258+
});
259+
260+
it('should detect completions', () => {
261+
expect(launcher._categorizeStreamEvent({ stop_reason: 'end_turn' })).toBe('completion');
262+
expect(launcher._categorizeStreamEvent({ content: [{ type: 'text' }] })).toBe('completion');
263+
});
264+
265+
it('should detect deltas', () => {
266+
expect(launcher._categorizeStreamEvent({ delta: {} })).toBe('content_delta');
267+
expect(launcher._categorizeStreamEvent({ content_block_delta: {} })).toBe('content_delta');
268+
});
269+
270+
it('should detect start events', () => {
271+
expect(launcher._categorizeStreamEvent({ message_start: {} })).toBe('start');
272+
expect(launcher._categorizeStreamEvent({ content_block_start: {} })).toBe('start');
273+
});
274+
275+
it('should detect stop events', () => {
276+
expect(launcher._categorizeStreamEvent({ message_stop: {} })).toBe('stop');
277+
expect(launcher._categorizeStreamEvent({ content_block_stop: {} })).toBe('stop');
278+
});
279+
280+
it('should return unknown for unrecognized events', () => {
281+
expect(launcher._categorizeStreamEvent({ random: 'data' })).toBe('unknown');
282+
expect(launcher._categorizeStreamEvent({})).toBe('unknown');
283+
});
284+
});
285+
286+
describe('copySessionTranscript', () => {
287+
it('should encode working directory path correctly', async () => {
288+
const sessionId = 'test-session-123';
289+
const workingDir = '/foo/bar/baz';
290+
291+
// Mock the home directory to our test dir for this test
292+
const expectedEncodedPath = '-foo-bar-baz';
293+
294+
// This will fail to find the file, but we can check the emitted event
295+
let emittedPath = '';
296+
launcher.on('transcript-not-found', ({ sourcePath }) => {
297+
emittedPath = sourcePath;
298+
});
299+
300+
await launcher.copySessionTranscript(sessionId, workingDir, testDir);
301+
302+
expect(emittedPath).toContain(expectedEncodedPath);
303+
expect(emittedPath).toContain(sessionId);
304+
});
305+
306+
it('should return null when transcript does not exist', async () => {
307+
const result = await launcher.copySessionTranscript(
308+
'nonexistent-session',
309+
'/some/path',
310+
testDir
311+
);
312+
313+
expect(result).toBeNull();
314+
});
315+
316+
it('should emit transcript-not-found event', async () => {
317+
const emittedEvents: any[] = [];
318+
launcher.on('transcript-not-found', (data) => {
319+
emittedEvents.push(data);
320+
});
321+
322+
await launcher.copySessionTranscript('test', '/path', testDir);
323+
324+
expect(emittedEvents.length).toBeGreaterThan(0);
325+
expect(emittedEvents[0].sourcePath).toBeDefined();
120326
});
121327
});
122328

0 commit comments

Comments
 (0)