diff --git a/PRESS_RELEASE.md b/PRESS_RELEASE.md new file mode 100644 index 00000000..07e04945 --- /dev/null +++ b/PRESS_RELEASE.md @@ -0,0 +1,149 @@ +# multiclaude: A Brownian Ratchet for AI-Assisted Development + +**Lightweight orchestration for multiple Claude Code agents** + +## The Problem + +Modern AI coding assistants are powerful, but they work one task at a time. +When you have a backlog of issues, tests to write, and documentation to +update, you wait. Each task queues behind the last. + +What if you could parallelize? + +## The Solution + +**multiclaude** is a lightweight orchestrator that runs multiple Claude Code +agents simultaneously on your GitHub repository. Each agent works in its own +isolated environment, and you can spawn as many as you need. + +```bash +multiclaude init https://github.com/your/repo +multiclaude work "Add unit tests for auth" +multiclaude work "Fix issue #42" +multiclaude work "Update API documentation" +``` + +Three tasks. Three agents. Running in parallel. + +## The Philosophy: Brownian Ratchet + +In physics, a Brownian ratchet converts random molecular motion into +directed movement through a mechanism that allows motion in only one +direction. + +multiclaude applies this principle to software development: + +**The Chaos**: Multiple agents work simultaneously. They may duplicate +effort, create conflicts, or produce suboptimal solutions. This is fine. +More attempts mean more chances for progress. + +**The Ratchet**: CI is the arbiter. If tests pass, the code merges. Every +merged PR clicks forward one notch. Progress is permanent. + +This approach optimizes for throughput of successful changes, not efficiency +of individual agents. Redundant work is cheaper than blocked work. + +## Key Features + +**Observable**: All agents run in tmux windows. Attach anytime to watch them +work or intervene when needed. + +**Isolated**: Each agent gets its own git worktree. No interference between +parallel tasks. + +**Self-Healing**: The daemon monitors agent health, restarts crashed +processes, and cleans up finished work. + +**Simple**: Filesystem for state. Tmux for visibility. Git for isolation. No +databases, no cloud dependencies, no complex setup. + +## How It Works + +multiclaude spawns three types of agents: + +1. **Supervisor** - Coordinates agents, helps stuck workers, tracks progress +2. **Workers** - Execute tasks, create pull requests +3. **Merge Queue** - Monitors PRs, merges when CI passes + +Workers communicate via filesystem-based messages. The supervisor nudges +stuck agents. The merge queue ensures only passing code lands. + +## Remote-First Design + +Unlike tools designed for solo development, multiclaude treats software +engineering as an MMORPG: + +- Your **workspace** is your persistent home base +- **Workers** are party members you spawn for quests +- The **supervisor** coordinates the guild +- The **merge queue** is the raid boss guarding main + +The system keeps running when you're away. Spawn workers before lunch. +Review their PRs when you return. + +## Comparison with Gastown + +multiclaude was developed independently but shares goals with Steve Yegge's +Gastown project. Both orchestrate multiple Claude Code instances using tmux +and git worktrees. + +**Where they differ:** + +| Aspect | multiclaude | Gastown | +|--------|-------------|---------| +| Agent model | 3 roles | 7 roles | +| Philosophy | Minimal, Unix-style | Comprehensive orchestration | +| State | JSON + filesystem | Git-backed hooks | +| Target | Remote-first teams | Solo development | + +Choose multiclaude for simplicity. Choose Gastown for sophisticated features +like work swarming and structured work units. + +## Getting Started + +```bash +# Install +go install github.com/dlorenc/multiclaude/cmd/multiclaude@latest + +# Start the daemon +multiclaude start + +# Initialize a repository +multiclaude init https://github.com/your/repo + +# Spawn a worker +multiclaude work "Your task here" + +# Watch it work +tmux attach -t mc-repo +``` + +## Requirements + +- Go 1.21+ +- tmux +- git +- GitHub CLI (authenticated) + +## Open Source + +multiclaude is MIT licensed and available on GitHub: + +https://github.com/dlorenc/multiclaude + +Contributions welcome. Issues and PRs are the ratchet - if CI passes, it +ships. + +## About + +multiclaude embraces a counterintuitive truth: in AI-assisted development, +chaos is fine as long as you ratchet forward. Perfect coordination is +expensive and fragile. Multiple imperfect attempts that occasionally succeed +beat waiting for one perfect solution. + +Let the agents work. Let CI judge. Click the ratchet forward. + +--- + +*For more information, see the project documentation or open an issue on +GitHub.* diff --git a/README.md b/README.md index 12877561..f55ecbb1 100644 --- a/README.md +++ b/README.md @@ -1,60 +1,105 @@ # multiclaude -A lightweight orchestrator for running multiple Claude Code agents on GitHub repositories. +A lightweight orchestrator for running multiple Claude Code agents on +GitHub repositories. -multiclaude spawns and coordinates autonomous Claude Code instances that work together on your codebase. Each agent runs in its own tmux window with an isolated git worktree, making all work observable and interruptible at any time. +multiclaude spawns and coordinates autonomous Claude Code instances that +work together on your codebase. Each agent runs in its own tmux window +with an isolated git worktree, making all work observable and +interruptible at any time. ## Philosophy: The Brownian Ratchet -multiclaude embraces a counterintuitive design principle: **chaos is fine, as long as we ratchet forward**. +multiclaude embraces a counterintuitive design principle: **chaos is +fine, as long as we ratchet forward**. -In physics, a Brownian ratchet is a thought experiment where random molecular motion is converted into directed movement through a mechanism that allows motion in only one direction. multiclaude applies this principle to software development. +In physics, a Brownian ratchet is a thought experiment where random +molecular motion is converted into directed movement through a mechanism +that allows motion in only one direction. multiclaude applies this +principle to software development. -**The Chaos**: Multiple autonomous agents work simultaneously on overlapping concerns. They may duplicate effort, create conflicting changes, or produce suboptimal solutions. This apparent disorder is not a bug—it's a feature. More attempts mean more chances for progress. +**The Chaos**: Multiple autonomous agents work simultaneously on +overlapping concerns. They may duplicate effort, create conflicting +changes, or produce suboptimal solutions. This apparent disorder is not +a bug—it's a feature. More attempts mean more chances for progress. -**The Ratchet**: CI is the arbiter. If it passes, the code goes in. Every merged PR clicks the ratchet forward one notch. Progress is permanent—we never go backward. The merge queue agent serves as this ratchet mechanism, ensuring that any work meeting the CI bar gets incorporated. +**The Ratchet**: CI is the arbiter. If it passes, the code goes in. +Every merged PR clicks the ratchet forward one notch. Progress is +permanent—we never go backward. The merge queue agent serves as this +ratchet mechanism, ensuring that any work meeting the CI bar gets +incorporated. **Why This Works**: -- Agents don't need perfect coordination. Redundant work is cheaper than blocked work. +- Agents don't need perfect coordination. Redundant work is cheaper than + blocked work. - Failed attempts cost nothing. Only successful attempts matter. -- Incremental progress compounds. Many small PRs beat waiting for one perfect PR. -- The system is antifragile. More agents mean more chaos but also more forward motion. +- Incremental progress compounds. Many small PRs beat waiting for one + perfect PR. +- The system is antifragile. More agents mean more chaos but also more + forward motion. -This philosophy means we optimize for throughput of successful changes, not efficiency of individual agents. An agent that produces a mergeable PR has succeeded, even if another agent was working on the same thing. +This philosophy means we optimize for throughput of successful changes, +not efficiency of individual agents. An agent that produces a mergeable +PR has succeeded, even if another agent was working on the same thing. ## Our Opinions -multiclaude is intentionally opinionated. These aren't configuration options—they're core beliefs baked into how the system works: +multiclaude is intentionally opinionated. These aren't configuration +options—they're core beliefs baked into how the system works: ### CI is King -CI is the source of truth. Period. If tests pass, the code can ship. If tests fail, the code doesn't ship. There's no "but the change looks right" or "I'm pretty sure it's fine." The automation decides. +CI is the source of truth. Period. If tests pass, the code can ship. If +tests fail, the code doesn't ship. There's no "but the change looks +right" or "I'm pretty sure it's fine." The automation decides. -Agents are forbidden from weakening CI to make their work pass. No skipping tests, no reducing coverage requirements, no "temporary" workarounds. If an agent can't pass CI, it asks for help or tries a different approach. +Agents are forbidden from weakening CI to make their work pass. No +skipping tests, no reducing coverage requirements, no "temporary" +workarounds. If an agent can't pass CI, it asks for help or tries a +different approach. ### Forward Progress Over Perfection -Any incremental progress is good. A reviewable PR is progress. A partial implementation with tests is progress. The only failure is an agent that doesn't push the ball forward at all. +Any incremental progress is good. A reviewable PR is progress. A partial +implementation with tests is progress. The only failure is an agent that +doesn't push the ball forward at all. -This means we'd rather have three okay PRs than wait for one perfect PR. We'd rather merge working code now and improve it later than block on getting everything right the first time. Small, frequent commits beat large, infrequent ones. +This means we'd rather have three okay PRs than wait for one perfect PR. +We'd rather merge working code now and improve it later than block on +getting everything right the first time. Small, frequent commits beat +large, infrequent ones. ### Chaos is Expected -Multiple agents working simultaneously will create conflicts, duplicate work, and occasionally step on each other's toes. This is fine. This is the plan. +Multiple agents working simultaneously will create conflicts, duplicate +work, and occasionally step on each other's toes. This is fine. This is +the plan. -Trying to perfectly coordinate agent work is both expensive and fragile. Instead, we let chaos happen and use CI as the ratchet that captures forward progress. Wasted work is cheap; blocked work is expensive. +Trying to perfectly coordinate agent work is both expensive and fragile. +Instead, we let chaos happen and use CI as the ratchet that captures +forward progress. Wasted work is cheap; blocked work is expensive. ### Humans Approve, Agents Execute -Agents do the work. Humans set the direction and approve the results. Agents should never make decisions that require human judgment—they should ask. +Agents do the work. Humans set the direction and approve the results. +Agents should never make decisions that require human judgment—they +should ask. -This means agents create PRs for human review. Agents ask the supervisor when they're stuck. Agents don't bypass review requirements or merge without appropriate approval. The merge queue agent can auto-merge, but only when CI passes and review requirements are met. +This means agents create PRs for human review. Agents ask the supervisor +when they're stuck. Agents don't bypass review requirements or merge +without appropriate approval. The merge queue agent can auto-merge, but +only when CI passes and review requirements are met. ## Gastown and multiclaude -multiclaude was developed independently but shares similar goals with [Gastown](https://github.com/steveyegge/gastown), Steve Yegge's multi-agent orchestrator for Claude Code released in January 2026. +multiclaude was developed independently but shares similar goals with +[Gastown](https://github.com/steveyegge/gastown), Steve Yegge's +multi-agent orchestrator for Claude Code released in January 2026. -Both projects solve the same fundamental problem: coordinating multiple Claude Code instances working on a shared codebase. Both use Go, tmux for observability, and git worktrees for isolation. If you're evaluating multi-agent orchestrators, you should look at both. +Both projects solve the same fundamental problem: coordinating multiple +Claude Code instances working on a shared codebase. Both use Go, tmux +for observability, and git worktrees for isolation. If you're evaluating +multi-agent orchestrators, you should look at both. **Where they differ:** @@ -67,23 +112,39 @@ Both projects solve the same fundamental problem: coordinating multiple Claude C | Philosophy | Minimal, Unix-style simplicity | Comprehensive orchestration system | | Maturity | Early development | More established, larger feature set | -multiclaude aims to be a simpler, more lightweight alternative—the "worse is better" approach. If you need sophisticated orchestration features, work swarming, or built-in crash recovery, Gastown may be a better fit. +multiclaude aims to be a simpler, more lightweight alternative—the +"worse is better" approach. If you need sophisticated orchestration +features, work swarming, or built-in crash recovery, Gastown may be a +better fit. ### Remote-First: Software is an MMORPG -The biggest philosophical difference: **multiclaude is designed for remote-first collaboration**. +The biggest philosophical difference: **multiclaude is designed for +remote-first collaboration**. -Gastown treats agents as NPCs in a single-player game. You're the player, agents are your minions. This works great for solo development where you want to parallelize your own work. +Gastown treats agents as NPCs in a single-player game. You're the +player, agents are your minions. This works great for solo development +where you want to parallelize your own work. -multiclaude treats software engineering as an **MMORPG**. You're one player among many—some human, some AI. The workspace agent is your character, but other humans have their own workspaces. Workers are party members you spawn for quests. The supervisor coordinates the guild. The merge queue is the raid boss that decides what loot (code) makes it into the vault (main branch). +multiclaude treats software engineering as an **MMORPG**. You're one +player among many—some human, some AI. The workspace agent is your +character, but other humans have their own workspaces. Workers are party +members you spawn for quests. The supervisor coordinates the guild. The +merge queue is the raid boss that decides what loot (code) makes it into +the vault (main branch). This means: -- **Your workspace persists**. It's your home base, not a temporary session. -- **You interact with workers, not control them**. Spawn them with a task, check on them later. +- **Your workspace persists**. It's your home base, not a temporary + session. +- **You interact with workers, not control them**. Spawn them with a + task, check on them later. - **Other humans can have their own workspaces** on the same repo. -- **The system keeps running when you're away**. Agents work, PRs merge, CI runs. +- **The system keeps running when you're away**. Agents work, PRs merge, + CI runs. -The workspace is where you hop in to spawn agents, check on progress, review what landed, and plan the next sprint—then hop out and let the system work while you sleep. +The workspace is where you hop in to spawn agents, check on progress, +review what landed, and plan the next sprint—then hop out and let the +system work while you sleep. ## Quick Start @@ -108,13 +169,18 @@ tmux attach -t mc-repo ## How It Works -multiclaude creates a tmux session for each repository with three types of agents: +multiclaude creates a tmux session for each repository with three types +of agents: -1. **Supervisor** - Coordinates all agents, answers status questions, nudges stuck workers +1. **Supervisor** - Coordinates all agents, answers status questions, + nudges stuck workers 2. **Workers** - Execute specific tasks, create PRs when done -3. **Merge Queue** - Monitors PRs, merges when CI passes, spawns fixup workers as needed +3. **Merge Queue** - Monitors PRs, merges when CI passes, spawns fixup + workers as needed -Agents communicate via a filesystem-based message system. The daemon routes messages and periodically nudges agents to keep work moving forward. +Agents communicate via a filesystem-based message system. The daemon +routes messages and periodically nudges agents to keep work moving +forward. ``` ┌─────────────────────────────────────────────────────────────┐ @@ -155,7 +221,9 @@ multiclaude repo rm # Remove a tracked repository ### Workspaces -Workspaces are persistent Claude sessions where you interact with the codebase, spawn workers, and manage your development flow. Each workspace has its own git worktree, tmux window, and Claude instance. +Workspaces are persistent Claude sessions where you interact with the +codebase, spawn workers, and manage your development flow. Each +workspace has its own git worktree, tmux window, and Claude instance. ```bash multiclaude workspace add # Create a new workspace @@ -169,9 +237,12 @@ multiclaude workspace # Connect to workspace (shorthand) **Notes:** - Workspaces use the branch naming convention `workspace/` -- Workspace names follow git branch naming rules (no spaces, special characters, etc.) -- A "default" workspace is created automatically when you run `multiclaude init` -- Use `multiclaude attach ` as an alternative to `workspace connect` +- Workspace names follow git branch naming rules (no spaces, special + characters, etc.) +- A "default" workspace is created automatically when you run + `multiclaude init` +- Use `multiclaude attach ` as an alternative to + `workspace connect` ### Workers @@ -183,7 +254,9 @@ multiclaude work list # List active workers multiclaude work rm # Remove worker (warns if uncommitted work) ``` -The `--push-to` flag creates a worker that pushes to an existing branch instead of creating a new PR. Use this when you want to iterate on an existing PR. +The `--push-to` flag creates a worker that pushes to an existing branch +instead of creating a new PR. Use this when you want to iterate on an +existing PR. ### Observing @@ -216,7 +289,8 @@ Agents have access to multiclaude-specific slash commands: ### What the tmux Session Looks Like -When you attach to a repo's tmux session, you'll see multiple windows—one per agent: +When you attach to a repo's tmux session, you'll see multiple +windows—one per agent: ``` ┌─────────────────────────────────────────────────────────────────────────────┐ @@ -251,7 +325,8 @@ Use standard tmux navigation: ### Workflow: Spawning Workers from Your Workspace -Your workspace is a persistent Claude session where you can spawn and manage workers: +Your workspace is a persistent Claude session where you can spawn and +manage workers: ``` ┌─────────────────────────────────────────────────────────────────────────────┐ @@ -302,7 +377,8 @@ Later, when you return: ### Watching the Supervisor -The supervisor coordinates agents and provides status updates. Attach to watch it work: +The supervisor coordinates agents and provides status updates. Attach to +watch it work: ```bash multiclaude attach supervisor --read-only @@ -390,11 +466,16 @@ When CI fails, the merge queue can spawn workers to fix it: ### Design Principles -1. **Observable** - All agent activity visible via tmux. Attach anytime to watch or intervene. -2. **Isolated** - Each agent works in its own git worktree. No interference between tasks. -3. **Recoverable** - State persists to disk. Daemon recovers gracefully from crashes. -4. **Safe** - Agents never weaken CI or bypass checks without human approval. -5. **Simple** - Minimal abstractions. Filesystem for state, tmux for visibility, git for isolation. +1. **Observable** - All agent activity visible via tmux. Attach anytime + to watch or intervene. +2. **Isolated** - Each agent works in its own git worktree. No + interference between tasks. +3. **Recoverable** - State persists to disk. Daemon recovers gracefully + from crashes. +4. **Safe** - Agents never weaken CI or bypass checks without human + approval. +5. **Simple** - Minimal abstractions. Filesystem for state, tmux for + visibility, git for isolation. ### Directory Structure @@ -424,7 +505,8 @@ Repositories can include optional configuration in `.multiclaude/`: ## Public Libraries -multiclaude includes two reusable Go packages that can be used independently of the orchestrator: +multiclaude includes two reusable Go packages that can be used +independently of the orchestrator: ### pkg/tmux - Programmatic tmux Interaction @@ -432,11 +514,18 @@ multiclaude includes two reusable Go packages that can be used independently of go get github.com/dlorenc/multiclaude/pkg/tmux ``` -Unlike existing Go tmux libraries ([gotmux](https://github.com/GianlucaP106/gotmux), [go-tmux](https://github.com/jubnzv/go-tmux)) that focus on workspace setup, this package provides features for **programmatic interaction with running CLI applications**: +Unlike existing Go tmux libraries +([gotmux](https://github.com/GianlucaP106/gotmux), +[go-tmux](https://github.com/jubnzv/go-tmux)) that focus on workspace +setup, this package provides features for **programmatic interaction +with running CLI applications**: -- **Multiline text via paste-buffer** - Send multi-line input atomically without triggering intermediate processing -- **Pane PID extraction** - Monitor whether processes in panes are still alive -- **pipe-pane output capture** - Capture all pane output to files for logging/analysis +- **Multiline text via paste-buffer** - Send multi-line input atomically + without triggering intermediate processing +- **Pane PID extraction** - Monitor whether processes in panes are still + alive +- **pipe-pane output capture** - Capture all pane output to files for + logging/analysis ```go client := tmux.NewClient() @@ -453,10 +542,13 @@ client.StartPipePane("session", "window", "/tmp/output.log") go get github.com/dlorenc/multiclaude/pkg/claude ``` -A library for launching and interacting with Claude Code instances in terminals: +A library for launching and interacting with Claude Code instances in +terminals: -- **Terminal abstraction** - Works with tmux or custom terminal implementations -- **Session management** - Automatic UUID session IDs and process tracking +- **Terminal abstraction** - Works with tmux or custom terminal + implementations +- **Session management** - Automatic UUID session IDs and process + tracking - **Output capture** - Route Claude output to files - **Multiline support** - Properly handles multi-line messages to Claude diff --git a/ROADMAP.md b/ROADMAP.md index 080234a4..b05956f1 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -36,7 +36,7 @@ Focus: Make the core experience rock-solid before adding features. ### P1 - Should Have (this quarter) -- [ ] **Task history**: Track what workers have done and their outcomes (PR merged/closed/pending) +- [x] **Task history**: Track what workers have done and their outcomes (PR merged/closed/pending) - [ ] **Agent restart**: Gracefully restart crashed agents without losing context - [ ] **Workspace refresh**: Easy command to sync workspace with latest main @@ -96,4 +96,5 @@ These features are explicitly **not wanted**. PRs implementing them should be cl ## Changelog +- **2026-02-03**: Marked Task history as complete (P1) - **2026-01-20**: Initial roadmap after Phase 1 cleanup (removed notifications, coordination, multi-provider) diff --git a/docs/DIAGRAMS.md b/docs/DIAGRAMS.md new file mode 100644 index 00000000..82b8f15b --- /dev/null +++ b/docs/DIAGRAMS.md @@ -0,0 +1,414 @@ +# Multiclaude Diagrams + +Visual diagrams for understanding multiclaude's architecture and data flows. + +## System Overview + +```mermaid +flowchart TB + subgraph User["User's Machine"] + CLI[CLI Client] + + subgraph Daemon["Daemon Process"] + Socket[Socket Server] + Health[Health Check
2 min cycle] + Router[Message Router
2 min cycle] + Wake[Wake/Nudge
2 min cycle] + State[(state.json)] + end + + subgraph Tmux["tmux session: mc-repo"] + Sup[supervisor] + MQ[merge-queue] + W1[worker-1] + W2[worker-2] + end + + subgraph Worktrees["Git Worktrees"] + WT1[wts/repo/supervisor] + WT2[wts/repo/merge-queue] + WT3[wts/repo/worker-1] + WT4[wts/repo/worker-2] + end + end + + CLI <-->|Unix Socket| Socket + Socket <--> State + Health --> Tmux + Router --> Tmux + Wake --> Tmux + + Sup --> WT1 + MQ --> WT2 + W1 --> WT3 + W2 --> WT4 +``` + +## The Brownian Ratchet + +The core philosophy: chaos creates progress when filtered through CI. + +```mermaid +flowchart LR + subgraph Chaos["Agent Activity (Chaotic)"] + WA[Worker A
auth feature] + WB[Worker B
auth feature] + WC[Worker C
bugfix #42] + end + + subgraph Ratchet["CI Gate"] + CI{CI Passes?} + end + + subgraph Progress["Main Branch"] + Main[████████████
irreversible progress] + end + + WA -->|PR| CI + WB -->|PR| CI + WC -->|PR| CI + + CI -->|pass| Main + CI -->|fail| Retry[retry or
spawn fixup] + Retry --> Chaos +``` + +## Agent Types and Relationships + +```mermaid +flowchart TB + Human[Human User] + + subgraph Agents + WS[Workspace
user interactive] + SUP[Supervisor
coordination] + MQ[Merge Queue
the ratchet] + W1[Worker 1] + W2[Worker 2] + REV[Review Agent] + end + + subgraph External + GH[(GitHub)] + CI[CI System] + end + + Human <-->|direct input| WS + Human -->|spawns| SUP + Human -->|spawns| MQ + + WS -->|spawn| W1 + SUP -->|guidance| W1 + SUP -->|guidance| W2 + + W1 -->|PR| GH + W2 -->|PR| GH + + MQ -->|monitor| GH + MQ -->|spawn| REV + MQ -->|merge| GH + + REV -->|comments| GH + + GH --> CI + CI -->|status| GH +``` + +## Worker Lifecycle + +```mermaid +stateDiagram-v2 + [*] --> Created: multiclaude work "task" + + Created --> Working: Claude starts + + state Working { + [*] --> Coding + Coding --> Testing + Testing --> Coding: tests fail + Testing --> PR: tests pass + PR --> [*] + } + + Working --> Completing: multiclaude agent complete + Completing --> MarkedForCleanup: Daemon marks ReadyForCleanup + MarkedForCleanup --> NotifySupervisor: Daemon notifies + NotifySupervisor --> NotifyMergeQueue + NotifyMergeQueue --> HealthCheckCleanup: Health check runs + HealthCheckCleanup --> [*]: Kill window, remove worktree +``` + +## Worker Creation Flow + +```mermaid +sequenceDiagram + participant User + participant CLI + participant Daemon + participant Git + participant Tmux + participant Claude + + User->>CLI: multiclaude work "Add tests" + CLI->>Daemon: list_agents (socket) + Daemon-->>CLI: agent list + + CLI->>Git: worktree add + Git-->>CLI: worktree created + + CLI->>Tmux: new-window + Tmux-->>CLI: window created + + CLI->>CLI: write prompt file + + CLI->>Tmux: send-keys (start Claude) + Tmux->>Claude: launch + + CLI->>Tmux: send-keys (task message) + + CLI->>Daemon: add_agent (worker) + Daemon->>Daemon: save state + Daemon-->>CLI: success + + CLI-->>User: Worker 'clever-fox' created +``` + +## Message Routing + +```mermaid +sequenceDiagram + participant AgentA as Worker A + participant FS as Filesystem + participant Daemon + participant Tmux + participant AgentB as Supervisor + + AgentA->>FS: write message.json
(status: pending) + + Note over Daemon: 2 min later
message router runs + + Daemon->>FS: scan messages dir + FS-->>Daemon: pending messages + + Daemon->>Tmux: send-keys (literal mode) + Tmux->>AgentB: paste message + + Daemon->>FS: update status
(pending → delivered) +``` + +## Message Status Flow + +```mermaid +stateDiagram-v2 + [*] --> pending: Message created + pending --> delivered: Daemon sends via tmux + delivered --> read: Agent sees message + read --> acked: Agent acknowledges + acked --> [*]: Message deleted +``` + +## Repository Initialization + +```mermaid +sequenceDiagram + participant User + participant CLI + participant Daemon + participant Git + participant Tmux + participant FS as Filesystem + + User->>CLI: multiclaude init github.com/org/repo + + CLI->>Daemon: ping + Daemon-->>CLI: pong + + CLI->>Git: clone repo + Git-->>CLI: cloned + + CLI->>Tmux: new-session mc-repo + CLI->>Tmux: new-window supervisor + CLI->>Tmux: new-window merge-queue + + CLI->>FS: write prompt files + + CLI->>Tmux: send-keys (start Claude) + + CLI->>Daemon: add_repo + Daemon->>FS: save state + Daemon-->>CLI: success + + CLI->>Daemon: add_agent (supervisor) + CLI->>Daemon: add_agent (merge-queue) + + CLI-->>User: Repository initialized +``` + +## Health Check Loop + +```mermaid +flowchart TD + Start[Health Check Starts
every 2 min] --> CheckSession{tmux session
exists?} + + CheckSession -->|no| TryRestore[Attempt Restoration] + TryRestore -->|success| CheckAgents + TryRestore -->|fail| MarkCleanup[Mark agents
for cleanup] + + CheckSession -->|yes| CheckAgents[Check each agent] + + CheckAgents --> WindowExists{tmux window
exists?} + + WindowExists -->|no| RemoveAgent[Remove from state] + WindowExists -->|yes| CheckReady{ReadyForCleanup?} + + CheckReady -->|yes| Cleanup[Kill window
Remove worktree
Clean messages] + CheckReady -->|no| NextAgent[Check next agent] + + Cleanup --> NextAgent + RemoveAgent --> NextAgent + + NextAgent --> PruneOrphans[Prune orphaned
worktrees & messages] + PruneOrphans --> End[Wait 2 min] + End --> Start +``` + +## Daemon Goroutines + +```mermaid +flowchart LR + subgraph Daemon + Main[main goroutine] + + subgraph Loops["Background Loops"] + Server[serverLoop
continuous] + Health[healthCheckLoop
2 min] + Router[messageRouterLoop
2 min] + Wake[wakeLoop
2 min] + end + + WG[sync.WaitGroup] + Ctx[context.Context] + end + + Main -->|spawn| Server + Main -->|spawn| Health + Main -->|spawn| Router + Main -->|spawn| Wake + + WG -.->|tracks| Loops + Ctx -.->|cancels| Loops +``` + +## File System Layout + +```mermaid +flowchart TB + subgraph Home["~/.multiclaude/"] + PID[daemon.pid] + Sock[daemon.sock] + Log[daemon.log] + State[state.json] + + subgraph Prompts["prompts/"] + P1[supervisor.md] + P2[merge-queue.md] + P3[worker-name.md] + end + + subgraph Repos["repos/"] + R1[my-repo/] + end + + subgraph WTS["wts/"] + subgraph WTRepo["my-repo/"] + WT1[supervisor/] + WT2[merge-queue/] + WT3[clever-fox/] + end + end + + subgraph Msgs["messages/"] + subgraph MsgRepo["my-repo/"] + MS[supervisor/] + MMQ[merge-queue/] + MW[clever-fox/] + end + end + + subgraph Config["claude-config/"] + subgraph CfgRepo["my-repo/"] + subgraph Agent["clever-fox/"] + Cmds[commands/] + end + end + end + end +``` + +## Shutdown Sequence + +```mermaid +sequenceDiagram + participant User + participant CLI + participant Daemon + participant Goroutines + participant FS as Filesystem + + User->>CLI: multiclaude stop-all + CLI->>Daemon: stop (socket) + + Daemon->>Daemon: d.cancel() + Daemon->>Goroutines: context cancelled + + par All goroutines + Goroutines->>Goroutines: check ctx.Done() + Goroutines->>Goroutines: return + end + + Daemon->>Daemon: d.wg.Wait() + Note over Daemon: All goroutines stopped + + Daemon->>Daemon: d.server.Stop() + Daemon->>FS: d.state.Save() + Daemon->>FS: d.pidFile.Remove() + + Daemon-->>CLI: success + CLI-->>User: Daemon stopped +``` + +## State Thread Safety + +```mermaid +flowchart TB + subgraph State["state.State"] + Mutex[sync.RWMutex] + Data[Repos map] + end + + subgraph Readers["Read Operations"] + R1[GetRepo] + R2[GetAgent] + R3[ListAgents] + end + + subgraph Writers["Write Operations"] + W1[AddRepo] + W2[AddAgent] + W3[RemoveAgent] + end + + R1 -->|RLock| Mutex + R2 -->|RLock| Mutex + R3 -->|RLock| Mutex + + W1 -->|Lock| Mutex + W2 -->|Lock| Mutex + W3 -->|Lock| Mutex + + Mutex --> Data + + W1 -->|auto-save| Disk[(state.json)] + W2 -->|auto-save| Disk + W3 -->|auto-save| Disk +``` diff --git a/docs/FAQ.md b/docs/FAQ.md new file mode 100644 index 00000000..fa63f4aa --- /dev/null +++ b/docs/FAQ.md @@ -0,0 +1,214 @@ +# Frequently Asked Questions + +## General + +### What is multiclaude? + +multiclaude is a lightweight orchestrator for running multiple Claude Code +agents on GitHub repositories. Each agent runs in its own tmux window with +an isolated git worktree. + +### How is it different from Gastown? + +Both solve multi-agent orchestration for Claude Code. multiclaude aims for +Unix-style simplicity with fewer concepts: 3 agent roles vs 7, filesystem +state vs git-backed hooks, minimal dependencies. Gastown offers more +sophisticated features like work swarming and the Beads framework. See the +[README comparison](../README.md#gastown-and-multiclaude) for details. + +### What are the prerequisites? + +- Go 1.21+ +- tmux (terminal multiplexer) +- git +- GitHub CLI (`gh`) authenticated via `gh auth login` + +## Agents + +### What types of agents are there? + +- **Supervisor**: Coordinates all agents, answers status questions, helps + stuck workers +- **Workers**: Execute specific tasks, create PRs when done +- **Merge Queue**: Monitors PRs, merges when CI passes +- **Workspace**: Your interactive Claude session for spawning workers and + managing flow + +### How do agents communicate? + +Via filesystem-based JSON messages in `~/.multiclaude/messages/`. The daemon +routes messages and periodically nudges agents to check their inbox. + +### What happens when an agent crashes? + +The daemon's health check (every 2 minutes) detects dead agents and attempts +to restart them with `--resume` to preserve session context. See +[CRASH_RECOVERY.md](CRASH_RECOVERY.md) for details. + +### Can I have multiple workers? + +Yes. Spawn as many as you want: + +```bash +multiclaude work "Task 1" +multiclaude work "Task 2" +multiclaude work "Task 3" +``` + +They work in parallel, each in their own git worktree. + +## Workflow + +### How do I see what agents are doing? + +```bash +# Attach to the tmux session (all agents) +tmux attach -t mc-repo + +# Attach to a specific agent +multiclaude attach happy-platypus --read-only +``` + +### How do I check worker status? + +```bash +multiclaude work list # Active workers +multiclaude repo history # Completed tasks and PRs +multiclaude status # Overall system status +``` + +### How do I stop a worker? + +```bash +multiclaude work rm happy-platypus +``` + +This warns if there's uncommitted work. + +### How do I iterate on an existing PR? + +Use `--push-to` to have a worker push to an existing branch: + +```bash +multiclaude work "Fix review comments" --branch origin/work/fox --push-to work/fox +``` + +## Troubleshooting + +### "daemon is not running" + +Start it: + +```bash +multiclaude start +``` + +### "repository already initialized" + +The repo is already tracked. Check with: + +```bash +multiclaude list +``` + +### "permission denied" on socket + +The daemon socket may have wrong permissions. Try: + +```bash +multiclaude stop-all +multiclaude start +``` + +### Agent seems stuck + +1. Check if Claude is waiting for input: + ```bash + multiclaude attach --read-only + ``` + +2. Send it a nudge message: + ```bash + multiclaude agent send-message "Status update?" + ``` + +3. The supervisor also periodically nudges stuck agents. + +### Uncommitted work in a dead worker + +```bash +# Check the worktree +cd ~/.multiclaude/wts// +git status + +# Save the work +git add . && git commit -m "WIP" +git push -u origin work/ +``` + +### How do I reset everything? + +```bash +multiclaude stop-all --clean +``` + +This stops all agents and removes state files (but preserves repo clones and +worktrees). + +## Configuration + +### Can I customize agent behavior? + +Yes. Add files to `.multiclaude/` in your repository: + +- `SUPERVISOR.md` - Additional instructions for supervisor +- `WORKER.md` - Additional instructions for workers +- `REVIEWER.md` - Additional instructions for merge queue +- `hooks.json` - Claude Code hooks configuration + +### Where does multiclaude store data? + +Everything lives in `~/.multiclaude/`: + +``` +~/.multiclaude/ +├── daemon.pid # Daemon process ID +├── daemon.sock # Unix socket +├── daemon.log # Daemon logs +├── state.json # All state +├── repos// # Cloned repositories +├── wts// # Git worktrees +└── messages// # Agent messages +``` + +See [DIRECTORY_STRUCTURE.md](DIRECTORY_STRUCTURE.md) for full details. + +### How do I view daemon logs? + +```bash +multiclaude daemon logs -f # Follow logs +tail -f ~/.multiclaude/daemon.log +``` + +## Philosophy + +### Why "Brownian Ratchet"? + +In physics, a Brownian ratchet converts random motion into directed movement +through a mechanism that only allows forward motion. multiclaude applies +this: multiple agents create "chaos" (parallel, potentially overlapping +work), but CI acts as the ratchet - only passing code merges, ensuring +permanent forward progress. + +### Why embrace chaos instead of coordination? + +Trying to perfectly coordinate agent work is expensive and fragile. Instead: +- Redundant work is cheaper than blocked work +- Failed attempts cost nothing; only successful attempts matter +- CI is the arbiter - if it passes, the code is good +- More agents mean more chaos but also more forward motion + +### Why can't agents merge without CI? + +CI is King. Agents are forbidden from weakening CI or bypassing checks. +This ensures the ratchet never slips backward. diff --git a/docs/QUICKSTART.md b/docs/QUICKSTART.md new file mode 100644 index 00000000..6f38bb55 --- /dev/null +++ b/docs/QUICKSTART.md @@ -0,0 +1,137 @@ +# Quickstart Guide + +Get multiclaude running in 5 minutes. + +## Prerequisites + +You need: +- Go 1.21+ +- tmux +- git +- GitHub CLI (`gh`) - authenticated via `gh auth login` + +## Install + +```bash +go install github.com/dlorenc/multiclaude/cmd/multiclaude@latest +``` + +## Start the Daemon + +multiclaude runs a background daemon that coordinates agents. + +```bash +multiclaude start +``` + +Check it's running: + +```bash +multiclaude daemon status +``` + +## Initialize a Repository + +Point multiclaude at a GitHub repository you want to work on: + +```bash +multiclaude init https://github.com/your/repo +``` + +This: +- Clones the repository +- Creates a tmux session (`mc-repo`) +- Spawns a supervisor agent +- Spawns a merge-queue agent +- Creates a default workspace for you + +## Spawn a Worker + +Create a worker to do a task: + +```bash +multiclaude work "Add unit tests for the auth module" +``` + +The worker gets a fun name like `happy-platypus` and starts working +immediately. + +## Watch Agents Work + +Attach to the tmux session to see all agents: + +```bash +tmux attach -t mc-repo +``` + +Navigate between agent windows: +- `Ctrl-b n` - next window +- `Ctrl-b p` - previous window +- `Ctrl-b w` - window picker +- `Ctrl-b d` - detach (agents keep running) + +Or attach to a specific agent: + +```bash +multiclaude attach happy-platypus --read-only +``` + +## Connect to Your Workspace + +Your workspace is a persistent Claude session where you interact with the +codebase: + +```bash +multiclaude workspace connect default +``` + +From here you can spawn more workers, check status, and manage your +development flow. + +## Check Status + +```bash +multiclaude status # Overall status +multiclaude work list # List active workers +multiclaude repo history # What workers have done +``` + +## Stop Everything + +When you're done: + +```bash +multiclaude stop-all # Stop daemon and all agents +``` + +Or just stop the daemon (agents keep running in tmux): + +```bash +multiclaude daemon stop +``` + +## Next Steps + +- Read the [README](../README.md) for the full philosophy and feature list +- See [CRASH_RECOVERY.md](CRASH_RECOVERY.md) for what to do when things go + wrong +- Check [DIRECTORY_STRUCTURE.md](DIRECTORY_STRUCTURE.md) to understand where + files live + +## Common Issues + +**"daemon is not running"** + +Start it: `multiclaude start` + +**"repository already initialized"** + +You've already set up this repo. Use `multiclaude list` to see tracked repos. + +**"gh auth error"** + +Authenticate GitHub CLI: `gh auth login` + +**tmux not found** + +Install tmux: `brew install tmux` (macOS) or `apt install tmux` (Linux) diff --git a/docs/WORKER_LIFECYCLE_AUDIT.md b/docs/WORKER_LIFECYCLE_AUDIT.md new file mode 100644 index 00000000..1a1c4c7b --- /dev/null +++ b/docs/WORKER_LIFECYCLE_AUDIT.md @@ -0,0 +1,235 @@ +# Worker Lifecycle Audit + +This document identifies gaps in the worker lifecycle where workers might fail +to start, complete, or clean up properly. + +**Audit Date:** 2026-02-03 +**Reviewer:** calm-panda (worker agent) +**Files Reviewed:** +- `internal/cli/cli.go` (createWorker, completeWorker, removeWorker) +- `internal/daemon/daemon.go` (health checks, cleanup, restoration) + +--- + +## Summary + +The worker lifecycle has several gaps that can result in: +- Orphaned resources (worktrees, tmux windows, branches) +- Lost work (uncommitted changes, unpushed commits) +- Missing task history records +- Inconsistent state between daemon and filesystem + +--- + +## Gap 1: No Rollback on Worker Creation Failure + +**Location:** `internal/cli/cli.go:1622-1828` + +**Issue:** Worker creation is not atomic. Resources are created sequentially: +1. Worktree created (line 1700/1707) +2. Tmux window created (line 1746) +3. Claude started (line 1783) +4. Worker registered with daemon (line 1796) + +If any step after worktree creation fails, earlier resources are orphaned. + +**Example scenarios:** +- Daemon registration fails → orphaned worktree + tmux window +- Claude startup fails → orphaned worktree + tmux window +- Tmux window creation fails → orphaned worktree + +**Severity:** Medium - orphans accumulate over time, waste disk space + +**Suggested fix:** Add rollback logic or deferred cleanup on error path. + +--- + +## Gap 2: Workers Not Cleaned Up After Crash + +**Location:** `internal/daemon/daemon.go:300-310` + +**Issue:** When a worker's Claude process dies (crash, system restart, etc.), +the worker is NOT auto-restarted or cleaned up. The comment says workers +"complete and clean up" but this only happens if the worker calls +`multiclaude agent complete` before dying. + +```go +// For transient agents (workers, review), don't auto-restart - they complete and clean up +``` + +A crashed worker remains in state with: +- Dead PID +- Existing tmux window (empty shell) +- Existing worktree (possibly with uncommitted work) + +The health check doesn't mark them as dead because the window still exists. + +**Severity:** High - crashed workers become zombies, blocking cleanup + +**Suggested fix:** Detect workers with dead PIDs and either: +- Mark them for cleanup after a grace period, or +- Notify supervisor that worker crashed with uncommitted work + +--- + +## Gap 3: Branch Name Assumption in Cleanup + +**Location:** `internal/daemon/daemon.go:1354-1360` + +**Issue:** Cleanup assumes worker branch is `work/`: +```go +branchName := "work/" + agentName +``` + +But workers created with `--push-to` flag use custom branch names +(cli.go:1694-1702). This causes: +- Failed branch deletion (branch doesn't exist) +- Wrong branch deleted (if another branch happens to match) + +**Severity:** Low - deletion just fails silently with a warning + +**Suggested fix:** Store actual branch name in agent state, use it for cleanup. + +--- + +## Gap 4: Session Restoration Loses Worker State + +**Location:** `internal/daemon/daemon.go:1614-1629` + +**Issue:** When tmux session is lost (e.g., system restart, tmux server crash), +`restoreRepoAgents` removes ALL agents from state before recreating the session: + +```go +for agentName := range repo.Agents { + d.state.RemoveAgent(repoName, agentName) +} +``` + +Only persistent agents (supervisor, merge-queue, workspace) are recreated. +Workers are silently lost without: +- Task history record +- Notification to user +- Any recovery attempt + +**Severity:** High - all in-progress worker state lost silently + +**Suggested fix:** +- Record task history before removing workers +- Log warning about lost workers +- Consider storing worktree paths for recovery + +--- + +## Gap 5: Manual Removal Skips Task History + +**Location:** `internal/cli/cli.go:2225-2366` + +**Issue:** `multiclaude worker rm` removes workers without recording task +history. Only `cleanupDeadAgents` calls `recordTaskHistory`. + +Workers removed manually have no record of what they were working on. + +**Severity:** Low - task history is for auditing, not critical + +**Suggested fix:** Call daemon's complete_agent or add history recording to +removeWorker. + +--- + +## Gap 6: Race Condition in Worker Completion + +**Location:** `internal/daemon/daemon.go:984-1046` + +**Issue:** `handleCompleteAgent` does three async things: +1. Marks agent as `ReadyForCleanup` +2. Sends messages to supervisor/merge-queue (async delivery via `go d.routeMessages()`) +3. Triggers cleanup check (async via `go d.checkAgentHealth()`) + +The worker process might still be running after returning from `complete_agent`. +If health check runs fast, it could: +- Kill the tmux window while worker is still outputting +- Remove worktree while Claude is still writing files + +**Severity:** Low - unlikely in practice due to timing + +**Suggested fix:** Wait for worker process to exit before cleanup, or add +brief delay. + +--- + +## Gap 7: PID Detection Timing Issues + +**Location:** `internal/daemon/daemon.go:1768` + +**Issue:** After starting Claude, code sleeps 500ms then gets PID: +```go +time.Sleep(500 * time.Millisecond) +pid, err := d.tmux.GetPanePID(...) +``` + +This is fragile: +- If Claude takes >500ms to start, might get shell PID instead +- If Claude starts faster, works but wastes time +- No verification that PID is actually Claude process + +**Severity:** Low - usually works, but PID might be wrong + +**Suggested fix:** Poll for Claude process or verify PID is claude binary. + +--- + +## Gap 8: Orphaned State Without Worktree + +**Location:** `internal/daemon/daemon.go:1471-1501` + +**Issue:** `cleanupOrphanedWorktrees` only removes worktree directories not +tracked by git. It doesn't handle: +- Agent state without worktree (worktree manually deleted) +- Agent state with corrupted worktree (git refs broken) + +The repair command (`handleTriggerCleanup`) does check for missing worktrees, +but this isn't run automatically. + +**Severity:** Low - manual repair available + +**Suggested fix:** Add worktree existence check to health check loop. + +--- + +## Recommendations + +### Immediate (P0) + +1. **Add crash detection for workers** - If worker PID is dead and window + exists (empty shell), mark as crashed and notify supervisor. Don't silently + leave zombie workers. + +2. **Record task history on session loss** - Before clearing agents during + session restoration, record their task history so we know what was lost. + +### Short-term (P1) + +3. **Add rollback to worker creation** - Use deferred cleanup to remove + worktree/window if later steps fail. + +4. **Store actual branch name in state** - Don't assume `work/` for + branch cleanup. + +### Nice-to-have (P2) + +5. **Record task history on manual removal** - Make `worker rm` record history. + +6. **Improve PID detection** - Poll or verify instead of fixed sleep. + +--- + +## Testing Scenarios + +To verify fixes, test these scenarios: + +1. **Creation failure:** Kill daemon mid-worker-creation +2. **Worker crash:** `kill -9` the Claude process +3. **Tmux crash:** `tmux kill-server` with workers running +4. **Manual deletion:** `rm -rf` a worker worktree while it's in state +5. **Push-to cleanup:** Create worker with `--push-to`, complete it, verify + correct branch cleanup diff --git a/docs/reviews/pr-18-triage-review.md b/docs/reviews/pr-18-triage-review.md new file mode 100644 index 00000000..64da60fb --- /dev/null +++ b/docs/reviews/pr-18-triage-review.md @@ -0,0 +1,55 @@ +# PR #18 Review: pr-triage + +**Reviewer:** swift-badger (Claude) + Gemini second opinion +**Date:** 2026-02-28 +**PR:** https://github.com/whitmo/multiclaude/pull/18 +**Status:** Request Changes + +## Summary + +PR #18 claims to merge "5 clean PRs from upstream" but actually contains a full upstream sync: 99 commits, 91 files changed, 17,183 insertions, 5,535 deletions. + +## Scope Analysis + +| Claimed | Actual | +|---------|--------| +| 5 upstream PRs | 99 commits from full upstream history | +| ~659 lines of tests | 17,183 additions / 5,535 deletions | +| 3 test files | 91 files changed | + +Only 7 of 99 commits relate to the 5 claimed PRs. The other 92 include: +- Full CLI noun-group restructuring +- Fork-aware workflow (#267, #274-#276) +- Agent definitions infrastructure (#236-#240) +- Task management (#313) +- CI changes (golangci-lint, gofmt) +- Repo lifecycle commands (#315-#317) + +## Test Results + +- Build: PASS +- 24/25 test packages: PASS +- pkg/tmux: 2 flaky tests (pre-existing) + +## Code Quality Findings + +### Good +- `periodicLoop` helper reduces daemon loop duplication +- `startAgentWithConfig` consolidates agent startup +- `socket.SuccessResponse`/`ErrorResponse` standardize API responses +- Error messages updated for new CLI structure + +### Concerns +1. **Fragile type inference** — `handleSpawnAgent` uses `strings.Contains(name, "review")` to determine agent type +2. **Removed crash notification** — `notifySupervisorOfCrash` removed without replacement +3. **Removed selective wake** — `agentHasWork` check removed; all agents woken uniformly +4. **Removed cleanup** — `DeleteBranch` and `RemoveAll(agentConfigDir)` removed from agent teardown +5. **PID=0 in test mode** — `MULTICLAUDE_TEST_MODE=1` skips Claude startup, leaves PID at 0 + +## Recommendation + +Split into: +1. Full upstream sync PR (with accurate description) +2. Focused 5-PR cherry-pick (matching current description) + +Address code concerns before merging. diff --git a/internal/cli/cli.go b/internal/cli/cli.go index 17f16434..ffdaf59f 100644 --- a/internal/cli/cli.go +++ b/internal/cli/cli.go @@ -393,6 +393,14 @@ func (c *CLI) registerCommands() { Run: c.showHistory, } + // Stats command + c.rootCmd.Subcommands["stats"] = &Command{ + Name: "stats", + Description: "Show agent activity statistics", + Usage: "multiclaude stats [--repo ]", + Run: c.showStats, + } + // Agent commands (run from within Claude) agentCmd := &Command{ Name: "agent", @@ -431,7 +439,7 @@ func (c *CLI) registerCommands() { agentCmd.Subcommands["complete"] = &Command{ Name: "complete", Description: "Signal worker completion", - Usage: "multiclaude agent complete [--summary ] [--failure ]", + Usage: "multiclaude agent complete [--summary ] [--failure ] [--pr-url ] [--pr-number ]", Run: c.completeWorker, } @@ -467,6 +475,13 @@ func (c *CLI) registerCommands() { Run: c.repair, } + c.rootCmd.Subcommands["refresh"] = &Command{ + Name: "refresh", + Description: "Sync agent worktrees with main branch", + Usage: "multiclaude refresh", + Run: c.refresh, + } + // Claude restart command - for resuming Claude after exit c.rootCmd.Subcommands["claude"] = &Command{ Name: "claude", @@ -904,7 +919,7 @@ func (c *CLI) initRepo(args []string) error { case "assigned": mqTrackMode = state.TrackModeAssigned default: - return fmt.Errorf("invalid --mq-track value: %s (must be 'all', 'author', or 'assigned')", trackMode) + return errors.InvalidFlagValue("--mq-track", trackMode, []string{"all", "author", "assigned"}) } } @@ -942,7 +957,7 @@ func (c *CLI) initRepo(args []string) error { // Create tmux session tmuxSession := sanitizeTmuxSessionName(repoName) if tmuxSession == "mc-" { - return fmt.Errorf("invalid tmux session name: repository name cannot be empty") + return errors.InvalidRepoName("cannot be empty") } fmt.Printf("Creating tmux session: %s\n", tmuxSession) @@ -1480,35 +1495,16 @@ func (c *CLI) clearCurrentRepo(args []string) error { func (c *CLI) configRepo(args []string) error { flags, posArgs := ParseFlags(args) - // Determine repository + // Determine repository: positional arg takes priority, then resolveRepo fallback var repoName string if len(posArgs) >= 1 { repoName = posArgs[0] } else { - // Try to infer from current directory - cwd, err := os.Getwd() + resolved, err := c.resolveRepo(flags) if err != nil { - return fmt.Errorf("failed to get current directory: %w", err) - } - - // Check if we're in a tracked repo - repos := c.getReposList() - for _, repo := range repos { - repoPath := c.paths.RepoDir(repo) - if strings.HasPrefix(cwd, repoPath) { - repoName = repo - break - } - } - - if repoName == "" { - // If only one repo exists, use it - if len(repos) == 1 { - repoName = repos[0] - } else { - return fmt.Errorf("please specify a repository name or run from within a tracked repository") - } + return errors.NotInRepo() } + repoName = resolved } // Check if any config flags are provided @@ -1533,17 +1529,17 @@ func (c *CLI) showRepoConfig(repoName string) error { }, }) if err != nil { - return fmt.Errorf("failed to get repo config: %w (is daemon running?)", err) + return errors.DaemonCommunicationFailed("getting repo config", err) } if !resp.Success { - return fmt.Errorf("failed to get repo config: %s", resp.Error) + return errors.Wrap(errors.CategoryRuntime, "failed to get repo config", fmt.Errorf("%s", resp.Error)) } // Parse response configMap, ok := resp.Data.(map[string]interface{}) if !ok { - return fmt.Errorf("unexpected response format") + return errors.New(errors.CategoryRuntime, "unexpected response format from daemon") } fmt.Printf("Configuration for repository: %s\n\n", repoName) @@ -1587,7 +1583,7 @@ func (c *CLI) updateRepoConfig(repoName string, flags map[string]string) error { case "false": updateArgs["mq_enabled"] = false default: - return fmt.Errorf("invalid --mq-enabled value: %s (must be 'true' or 'false')", mqEnabled) + return errors.InvalidFlagValue("--mq-enabled", mqEnabled, []string{"true", "false"}) } } @@ -1596,7 +1592,7 @@ func (c *CLI) updateRepoConfig(repoName string, flags map[string]string) error { case "all", "author", "assigned": updateArgs["mq_track_mode"] = mqTrack default: - return fmt.Errorf("invalid --mq-track value: %s (must be 'all', 'author', or 'assigned')", mqTrack) + return errors.InvalidFlagValue("--mq-track", mqTrack, []string{"all", "author", "assigned"}) } } @@ -1606,11 +1602,11 @@ func (c *CLI) updateRepoConfig(repoName string, flags map[string]string) error { Args: updateArgs, }) if err != nil { - return fmt.Errorf("failed to update repo config: %w (is daemon running?)", err) + return errors.DaemonCommunicationFailed("updating repo config", err) } if !resp.Success { - return fmt.Errorf("failed to update repo config: %s", resp.Error) + return errors.Wrap(errors.CategoryRuntime, "failed to update repo config", fmt.Errorf("%s", resp.Error)) } fmt.Printf("Configuration updated for repository: %s\n", repoName) @@ -1709,6 +1705,35 @@ func (c *CLI) createWorker(args []string) error { } } + // Track created resources for rollback on failure. + // If any step after worktree creation fails, we clean up everything + // that was created so far to avoid orphaned worktrees/windows. + var tmuxSession string + var tmuxWindowCreated bool + var workerPromptFile string + succeeded := false + defer func() { + if succeeded { + return + } + fmt.Println("\nWorker creation failed, rolling back...") + if tmuxWindowCreated && tmuxSession != "" { + fmt.Printf(" Removing tmux window: %s\n", workerName) + killCmd := exec.Command("tmux", "kill-window", "-t", fmt.Sprintf("%s:%s", tmuxSession, workerName)) + if killErr := killCmd.Run(); killErr != nil { + fmt.Printf(" Warning: failed to kill tmux window: %v\n", killErr) + } + } + if workerPromptFile != "" { + fmt.Printf(" Removing prompt file: %s\n", workerPromptFile) + os.Remove(workerPromptFile) + } + fmt.Printf(" Removing worktree: %s\n", wtPath) + if removeErr := wt.Remove(wtPath, true); removeErr != nil { + fmt.Printf(" Warning: failed to remove worktree: %v\n", removeErr) + } + }() + // Get repository info to determine tmux session client := socket.NewClient(c.paths.DaemonSock) resp, err := client.Send(socket.Request{ @@ -1725,7 +1750,7 @@ func (c *CLI) createWorker(args []string) error { } // Get tmux session name (it's mc-) - tmuxSession := sanitizeTmuxSessionName(repoName) + tmuxSession = sanitizeTmuxSessionName(repoName) // Ensure tmux session exists before creating window // This handles cases where the session was killed or daemon didn't restore it @@ -1747,6 +1772,7 @@ func (c *CLI) createWorker(args []string) error { if err := cmd.Run(); err != nil { return errors.TmuxOperationFailed("create window", err) } + tmuxWindowCreated = true // Generate session ID for worker workerSessionID, err := claude.GenerateSessionID() @@ -1759,7 +1785,7 @@ func (c *CLI) createWorker(args []string) error { if hasPushTo { workerConfig.PushToBranch = pushTo } - workerPromptFile, err := c.writeWorkerPromptFile(repoPath, workerName, workerConfig) + workerPromptFile, err = c.writeWorkerPromptFile(repoPath, workerName, workerConfig) if err != nil { return fmt.Errorf("failed to write worker prompt: %w", err) } @@ -1813,6 +1839,8 @@ func (c *CLI) createWorker(args []string) error { return fmt.Errorf("failed to register worker: %s", resp.Error) } + succeeded = true + fmt.Println() fmt.Println("✓ Worker created successfully!") fmt.Printf(" Name: %s\n", workerName) @@ -1923,6 +1951,8 @@ func (c *CLI) listWorkers(args []string) error { statusCell = format.ColorCell(format.ColoredStatus(format.StatusCompleted), nil) case "stopped": statusCell = format.ColorCell(format.ColoredStatus(format.StatusError), nil) + case "crashed": + statusCell = format.ColorCell(format.ColoredStatus(format.StatusCrashed), nil) default: statusCell = format.ColorCell(format.ColoredStatus(format.StatusIdle), nil) } @@ -2171,6 +2201,209 @@ func (c *CLI) showHistory(args []string) error { return nil } +// showStats displays agent activity statistics +func (c *CLI) showStats(args []string) error { + flags, _ := ParseFlags(args) + + // Determine repository + repoName, err := c.resolveRepo(flags) + if err != nil { + return errors.NotInRepo() + } + + // Get all task history from daemon (use high limit to get all) + client := socket.NewClient(c.paths.DaemonSock) + historyResp, err := client.Send(socket.Request{ + Command: "task_history", + Args: map[string]interface{}{ + "repo": repoName, + "limit": float64(1000), // Fetch all history + }, + }) + if err != nil { + return errors.DaemonCommunicationFailed("getting task history", err) + } + + if !historyResp.Success { + return fmt.Errorf("failed to get task history: %s", historyResp.Error) + } + + // Get daemon status for active agent count + statusResp, err := client.Send(socket.Request{ + Command: "list_workers", + Args: map[string]interface{}{ + "repo": repoName, + }, + }) + + // Parse task history + entries, ok := historyResp.Data.([]interface{}) + if !ok { + entries = []interface{}{} + } + + // Calculate statistics + totalTasks := len(entries) + statusCounts := map[string]int{ + "merged": 0, + "open": 0, + "closed": 0, + "failed": 0, + "no-pr": 0, + } + + var totalDuration time.Duration + tasksWithDuration := 0 + repoPath := c.paths.RepoDir(repoName) + + for _, e := range entries { + entry, ok := e.(map[string]interface{}) + if !ok { + continue + } + + branch, _ := entry["branch"].(string) + prURL, _ := entry["pr_url"].(string) + storedStatus, _ := entry["status"].(string) + + // Get PR status + prStatus, _ := c.getPRStatusForBranch(repoPath, branch, prURL) + + // Use stored status if it indicates failure + if storedStatus == "failed" { + prStatus = "failed" + } + if prStatus == "" { + prStatus = "no-pr" + } + + statusCounts[prStatus]++ + + // Calculate duration if both times exist + createdAtStr, _ := entry["created_at"].(string) + completedAtStr, _ := entry["completed_at"].(string) + + if createdAtStr != "" && completedAtStr != "" { + createdAt, err1 := time.Parse(time.RFC3339, createdAtStr) + completedAt, err2 := time.Parse(time.RFC3339, completedAtStr) + if err1 == nil && err2 == nil && !completedAt.IsZero() { + duration := completedAt.Sub(createdAt) + if duration > 0 { + totalDuration += duration + tasksWithDuration++ + } + } + } + } + + // Count active agents + activeAgents := 0 + if statusResp.Success { + if workers, ok := statusResp.Data.([]interface{}); ok { + for _, w := range workers { + if worker, ok := w.(map[string]interface{}); ok { + if status, _ := worker["status"].(string); status == "running" { + activeAgents++ + } + } + } + } + } + + // Calculate average duration + var avgDuration time.Duration + if tasksWithDuration > 0 { + avgDuration = totalDuration / time.Duration(tasksWithDuration) + } + + // Display statistics + format.Header("Agent Statistics for '%s'", repoName) + fmt.Println() + + // Summary section + fmt.Println("Summary:") + fmt.Printf(" Total tasks completed: %d\n", totalTasks) + fmt.Printf(" Active agents: %d\n", activeAgents) + fmt.Println() + + // PR status breakdown + fmt.Println("PR Status:") + table := format.NewColoredTable("Status", "Count", "Percentage") + + prsCreated := statusCounts["merged"] + statusCounts["open"] + statusCounts["closed"] + addStatusRow := func(name string, count int) { + pct := 0.0 + if totalTasks > 0 { + pct = float64(count) / float64(totalTasks) * 100 + } + // Determine color based on status + var statusCell format.ColoredCell + switch name { + case "merged": + statusCell = format.ColorCell(name, format.Green) + case "open": + statusCell = format.ColorCell(name, format.Yellow) + case "closed", "failed": + statusCell = format.ColorCell(name, format.Red) + default: + statusCell = format.ColorCell(name, format.Dim) + } + table.AddRow( + statusCell, + format.Cell(fmt.Sprintf("%d", count)), + format.Cell(fmt.Sprintf("%.0f%%", pct)), + ) + } + + addStatusRow("merged", statusCounts["merged"]) + addStatusRow("open", statusCounts["open"]) + addStatusRow("closed", statusCounts["closed"]) + addStatusRow("failed", statusCounts["failed"]) + addStatusRow("no-pr", statusCounts["no-pr"]) + table.Print() + + // Time stats + if tasksWithDuration > 0 { + fmt.Println() + fmt.Println("Time Statistics:") + fmt.Printf(" Average task duration: %s\n", formatDuration(avgDuration)) + fmt.Printf(" Total time spent: %s\n", formatDuration(totalDuration)) + } + + // Success rate + if totalTasks > 0 { + fmt.Println() + fmt.Println("Success Metrics:") + prRate := float64(prsCreated) / float64(totalTasks) * 100 + mergeRate := 0.0 + if prsCreated > 0 { + mergeRate = float64(statusCounts["merged"]) / float64(prsCreated) * 100 + } + fmt.Printf(" PR creation rate: %.0f%% (%d/%d tasks)\n", prRate, prsCreated, totalTasks) + fmt.Printf(" PR merge rate: %.0f%% (%d/%d PRs)\n", mergeRate, statusCounts["merged"], prsCreated) + } + + return nil +} + +// formatDuration formats a duration in a human-readable way +func formatDuration(d time.Duration) string { + if d < time.Minute { + return fmt.Sprintf("%ds", int(d.Seconds())) + } + if d < time.Hour { + return fmt.Sprintf("%dm", int(d.Minutes())) + } + hours := int(d.Hours()) + mins := int(d.Minutes()) % 60 + if hours < 24 { + return fmt.Sprintf("%dh %dm", hours, mins) + } + days := hours / 24 + hours = hours % 24 + return fmt.Sprintf("%dd %dh", days, hours) +} + // getPRStatusForBranch queries GitHub for the PR status of a branch func (c *CLI) getPRStatusForBranch(repoPath, branch, existingPRURL string) (status, prLink string) { // If we already have a PR URL, just return it formatted @@ -2399,16 +2632,9 @@ func (c *CLI) addWorkspace(args []string) error { } // Determine repository - var repoName string - if r, ok := flags["repo"]; ok { - repoName = r - } else { - // Try to infer from current directory - if inferred, err := c.inferRepoFromCwd(); err == nil { - repoName = inferred - } else { - return errors.MultipleRepos() - } + repoName, err := c.resolveRepo(flags) + if err != nil { + return errors.NotInRepo() } // Determine branch to start from @@ -2441,7 +2667,7 @@ func (c *CLI) addWorkspace(args []string) error { agentType, _ := agentMap["type"].(string) name, _ := agentMap["name"].(string) if agentType == "workspace" && name == workspaceName { - return fmt.Errorf("workspace '%s' already exists in repo '%s'", workspaceName, repoName) + return errors.WorkspaceAlreadyExists(workspaceName, repoName) } } } @@ -2791,10 +3017,10 @@ func (c *CLI) connectWorkspace(args []string) error { }, }) if err != nil { - return fmt.Errorf("failed to get workspace info: %w (is daemon running?)", err) + return errors.DaemonCommunicationFailed("getting workspace info", err) } if !resp.Success { - return fmt.Errorf("failed to get workspace info: %s", resp.Error) + return errors.Wrap(errors.CategoryRuntime, "failed to get workspace info", fmt.Errorf("%s", resp.Error)) } agents, _ := resp.Data.([]interface{}) @@ -2861,7 +3087,7 @@ func (c *CLI) connectWorkspace(args []string) error { // validateWorkspaceName validates that a workspace name follows branch name restrictions func validateWorkspaceName(name string) error { if name == "" { - return fmt.Errorf("workspace name cannot be empty") + return errors.InvalidWorkspaceName("name cannot be empty") } // Git branch name restrictions @@ -2872,25 +3098,25 @@ func validateWorkspaceName(name string) error { // - Cannot be "." or ".." if name == "." || name == ".." { - return fmt.Errorf("workspace name cannot be '.' or '..'") + return errors.InvalidWorkspaceName("cannot be '.' or '..'") } if strings.HasPrefix(name, ".") || strings.HasPrefix(name, "-") { - return fmt.Errorf("workspace name cannot start with '.' or '-'") + return errors.InvalidWorkspaceName("cannot start with '.' or '-'") } if strings.HasSuffix(name, ".") || strings.HasSuffix(name, "/") { - return fmt.Errorf("workspace name cannot end with '.' or '/'") + return errors.InvalidWorkspaceName("cannot end with '.' or '/'") } if strings.Contains(name, "..") { - return fmt.Errorf("workspace name cannot contain '..'") + return errors.InvalidWorkspaceName("cannot contain '..'") } invalidChars := []string{"\\", "~", "^", ":", "?", "*", "[", "@", "{", "}", " ", "\t", "\n"} for _, char := range invalidChars { if strings.Contains(name, char) { - return fmt.Errorf("workspace name cannot contain '%s'", char) + return errors.InvalidWorkspaceName(fmt.Sprintf("cannot contain '%s'", char)) } } @@ -3142,22 +3368,25 @@ func normalizeGitHubURL(url string) string { // findRepoFromGitRemote looks for a git remote in the current directory // and tries to match it against known repositories in state. +// It checks both "origin" and "upstream" remotes to support fork workflows +// where origin points to the user's fork and upstream points to the tracked repo. func (c *CLI) findRepoFromGitRemote() (string, error) { - // Run git remote get-url origin - cmd := exec.Command("git", "remote", "get-url", "origin") - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("failed to get git remote: %w", err) - } - - remoteURL := strings.TrimSpace(string(output)) - if remoteURL == "" { - return "", fmt.Errorf("git remote URL is empty") + // Collect remote URLs from origin and upstream (covers fork workflows) + var remoteURLs []string + for _, remote := range []string{"origin", "upstream"} { + cmd := exec.Command("git", "remote", "get-url", remote) + output, err := cmd.Output() + if err != nil { + continue + } + url := strings.TrimSpace(string(output)) + if url != "" { + remoteURLs = append(remoteURLs, url) + } } - normalizedRemote := normalizeGitHubURL(remoteURL) - if normalizedRemote == "" { - return "", fmt.Errorf("not a GitHub URL: %s", remoteURL) + if len(remoteURLs) == 0 { + return "", fmt.Errorf("no git remotes found") } // Load state to check against known repositories @@ -3166,20 +3395,27 @@ func (c *CLI) findRepoFromGitRemote() (string, error) { return "", err } - // Iterate through repos and find a match - for _, repoName := range st.ListRepos() { - repo, exists := st.GetRepo(repoName) - if !exists { + // Check each remote URL against tracked repos + for _, remoteURL := range remoteURLs { + normalizedRemote := normalizeGitHubURL(remoteURL) + if normalizedRemote == "" { continue } - normalizedStateURL := normalizeGitHubURL(repo.GithubURL) - if normalizedStateURL != "" && normalizedStateURL == normalizedRemote { - return repoName, nil + for _, repoName := range st.ListRepos() { + repo, exists := st.GetRepo(repoName) + if !exists { + continue + } + + normalizedStateURL := normalizeGitHubURL(repo.GithubURL) + if normalizedStateURL != "" && normalizedStateURL == normalizedRemote { + return repoName, nil + } } } - return "", fmt.Errorf("no matching repository found for remote: %s", remoteURL) + return "", fmt.Errorf("no matching repository found for remotes: %v", remoteURLs) } // resolveRepo determines the repository to use based on: @@ -3341,6 +3577,20 @@ func (c *CLI) completeWorker(args []string) error { fmt.Printf("Failure reason: %s\n", failureReason) } + // Add optional PR URL + if prURL, ok := flags["pr-url"]; ok && prURL != "" { + reqArgs["pr_url"] = prURL + fmt.Printf("PR URL: %s\n", prURL) + } + + // Add optional PR number + if prNumber, ok := flags["pr-number"]; ok && prNumber != "" { + if num, err := strconv.Atoi(prNumber); err == nil && num > 0 { + reqArgs["pr_number"] = num + fmt.Printf("PR Number: #%d\n", num) + } + } + client := socket.NewClient(c.paths.DaemonSock) resp, err := client.Send(socket.Request{ Command: "complete_agent", @@ -3368,15 +3618,10 @@ func (c *CLI) restartAgentCmd(args []string) error { } agentName := remaining[0] - // Get repo from flag or infer from cwd - repoName := flags["repo"] - if repoName == "" { - // Try to infer from cwd - inferred, err := c.inferRepoFromCwd() - if err != nil { - return errors.InvalidUsage("could not determine repository - use --repo flag or run from within a multiclaude worktree") - } - repoName = inferred + // Get repo from flag or infer from context + repoName, err := c.resolveRepo(flags) + if err != nil { + return errors.NotInRepo() } force := flags["force"] == "true" @@ -3435,31 +3680,11 @@ func (c *CLI) reviewPR(args []string) error { prNumber := parts[4] fmt.Printf("Reviewing PR #%s\n", prNumber) - // Determine repository from flag or current directory + // Determine repository from flag or current context flags, _ := ParseFlags(args[1:]) - var repoName string - if r, ok := flags["repo"]; ok { - repoName = r - } else { - // Try to infer from current directory - cwd, err := os.Getwd() - if err != nil { - return fmt.Errorf("failed to get current directory: %w", err) - } - - // Check if we're in a tracked repo - repos := c.getReposList() - for _, repo := range repos { - repoPath := c.paths.RepoDir(repo) - if strings.HasPrefix(cwd, repoPath) { - repoName = repo - break - } - } - - if repoName == "" { - return errors.NotInRepo() - } + repoName, err := c.resolveRepo(flags) + if err != nil { + return errors.NotInRepo() } // Generate review agent name @@ -3579,26 +3804,16 @@ func (c *CLI) reviewPR(args []string) error { func (c *CLI) viewLogs(args []string) error { if len(args) < 1 { - return fmt.Errorf("usage: multiclaude logs [--lines N] [--follow]") + return errors.InvalidUsage("usage: multiclaude logs [--lines N] [--follow]") } agentName := args[0] flags, _ := ParseFlags(args[1:]) // Determine repository - var repoName string - if r, ok := flags["repo"]; ok { - repoName = r - } else { - repos := c.getReposList() - if len(repos) == 0 { - return fmt.Errorf("no repositories tracked") - } - if len(repos) == 1 { - repoName = repos[0] - } else { - return fmt.Errorf("multiple repos exist. Use --repo flag to specify which one") - } + repoName, err := c.resolveRepo(flags) + if err != nil { + return errors.NotInRepo() } // Determine if it's a worker or system agent by checking if it exists in workers dir @@ -3611,7 +3826,7 @@ func (c *CLI) viewLogs(args []string) error { } else if _, err := os.Stat(systemLogFile); err == nil { logFile = systemLogFile } else { - return fmt.Errorf("no log file found for agent %s in repo %s", agentName, repoName) + return errors.LogFileNotFound(agentName, repoName) } // Check for --follow flag @@ -3722,7 +3937,7 @@ func (c *CLI) listLogsForRepo(repoName string) error { func (c *CLI) searchLogs(args []string) error { if len(args) < 1 { - return fmt.Errorf("usage: multiclaude logs search [--repo ]") + return errors.InvalidUsage("usage: multiclaude logs search [--repo ]") } pattern := args[0] @@ -3779,13 +3994,13 @@ func (c *CLI) cleanLogs(args []string) error { olderThan, ok := flags["older-than"] if !ok { - return fmt.Errorf("usage: multiclaude logs clean --older-than (e.g., 7d, 24h)") + return errors.InvalidUsage("usage: multiclaude logs clean --older-than (e.g., 7d, 24h)") } // Parse duration duration, err := parseDuration(olderThan) if err != nil { - return fmt.Errorf("invalid duration: %v", err) + return errors.InvalidDuration(olderThan) } cutoff := time.Now().Add(-duration) @@ -3868,10 +4083,10 @@ func (c *CLI) attachAgent(args []string) error { }, }) if err != nil { - return fmt.Errorf("failed to get agent info: %w (is daemon running?)", err) + return errors.DaemonCommunicationFailed("getting agent info", err) } if !resp.Success { - return fmt.Errorf("failed to get agent info: %s", resp.Error) + return errors.Wrap(errors.CategoryRuntime, "failed to get agent info", fmt.Errorf("%s", resp.Error)) } agents, _ := resp.Data.([]interface{}) @@ -4572,28 +4787,173 @@ func (c *CLI) localRepair(verbose bool) error { return nil } +// refresh syncs agent worktrees with the main branch +func (c *CLI) refresh(args []string) error { + fmt.Println("Syncing agent worktrees with main branch...") + + // Check if daemon is running + client := socket.NewClient(c.paths.DaemonSock) + _, err := client.Send(socket.Request{Command: "ping"}) + if err != nil { + // Daemon not running - do local refresh + fmt.Println("Daemon is not running. Performing local refresh...") + return c.localRefresh() + } + + // Trigger worktree refresh via daemon + resp, err := client.Send(socket.Request{ + Command: "refresh_worktrees", + }) + if err != nil { + return fmt.Errorf("failed to trigger refresh: %w", err) + } + if !resp.Success { + return fmt.Errorf("refresh failed: %s", resp.Error) + } + + // Display results + if data, ok := resp.Data.(map[string]interface{}); ok { + if refreshed, ok := data["refreshed"].([]interface{}); ok && len(refreshed) > 0 { + fmt.Printf("✓ Refreshed %d worktree(s):\n", len(refreshed)) + for _, agent := range refreshed { + fmt.Printf(" - %s\n", agent) + } + } + if skipped, ok := data["skipped"].([]interface{}); ok && len(skipped) > 0 { + fmt.Printf("⊘ Skipped %d worktree(s):\n", len(skipped)) + for _, item := range skipped { + if m, ok := item.(map[string]interface{}); ok { + agent := m["agent"] + reason := m["reason"] + fmt.Printf(" - %s: %s\n", agent, reason) + } + } + } + if errors, ok := data["errors"].([]interface{}); ok && len(errors) > 0 { + fmt.Printf("✗ Failed to refresh %d worktree(s):\n", len(errors)) + for _, item := range errors { + if m, ok := item.(map[string]interface{}); ok { + agent := m["agent"] + errMsg := m["error"] + fmt.Printf(" - %s: %s\n", agent, errMsg) + } + } + } + } + + return nil +} + +// localRefresh performs worktree refresh without the daemon running +func (c *CLI) localRefresh() error { + // Load state from disk + st, err := c.loadState() + if err != nil { + return err + } + + refreshed := 0 + skipped := 0 + errCount := 0 + + repos := st.GetAllRepos() + for repoName, repo := range repos { + repoPath := c.paths.RepoDir(repoName) + + // Check if repo path exists + if _, err := os.Stat(repoPath); os.IsNotExist(err) { + continue + } + + wt := worktree.NewManager(repoPath) + + // Get the upstream remote and default branch + remote, err := wt.GetUpstreamRemote() + if err != nil { + continue + } + + mainBranch, err := wt.GetDefaultBranch(remote) + if err != nil { + continue + } + + // Fetch from remote + if err := wt.FetchRemote(remote); err != nil { + continue + } + + // Check each worker agent's worktree + for agentName, agent := range repo.Agents { + // Only refresh worker worktrees + if agent.Type != state.AgentTypeWorker { + continue + } + + // Skip if worktree path is empty or doesn't exist + if agent.WorktreePath == "" { + continue + } + if _, err := os.Stat(agent.WorktreePath); os.IsNotExist(err) { + continue + } + + // Check worktree state + wtState, err := worktree.GetWorktreeState(agent.WorktreePath, remote, mainBranch) + if err != nil { + continue + } + + // Skip if can't refresh + if !wtState.CanRefresh { + fmt.Printf("⊘ Skipping %s/%s: %s\n", repoName, agentName, wtState.RefreshReason) + skipped++ + continue + } + + // Refresh the worktree + fmt.Printf("Refreshing %s/%s (%d commits behind)...\n", repoName, agentName, wtState.CommitsBehind) + result := worktree.RefreshWorktree(agent.WorktreePath, remote, mainBranch) + + if result.Error != nil { + fmt.Printf("✗ Failed to refresh %s/%s: %v\n", repoName, agentName, result.Error) + errCount++ + } else if result.Skipped { + fmt.Printf("⊘ Skipped %s/%s: %s\n", repoName, agentName, result.SkipReason) + skipped++ + } else { + fmt.Printf("✓ Refreshed %s/%s (rebased %d commits)\n", repoName, agentName, result.CommitsRebased) + refreshed++ + } + } + } + + fmt.Printf("\n✓ Refresh completed: %d refreshed, %d skipped, %d errors\n", refreshed, skipped, errCount) + return nil +} + // restartClaude restarts Claude in the current agent context. // It auto-detects whether to use --resume or --session-id based on session history. func (c *CLI) restartClaude(args []string) error { // Infer agent context from cwd repoName, agentName, err := c.inferAgentContext() if err != nil { - return fmt.Errorf("cannot determine agent context: %w\n\nRun this command from within a multiclaude agent tmux window", err) + return errors.NotInAgentContext() } // Load state to get session ID st, err := state.Load(c.paths.StateFile) if err != nil { - return fmt.Errorf("failed to load state: %w", err) + return errors.Wrap(errors.CategoryRuntime, "failed to load state", err) } agent, exists := st.GetAgent(repoName, agentName) if !exists { - return fmt.Errorf("agent '%s' not found in state for repo '%s'", agentName, repoName) + return errors.AgentNotInState(agentName, repoName) } if agent.SessionID == "" { - return fmt.Errorf("agent has no session ID - try removing and recreating the agent") + return errors.NoSessionID(agentName) } // Get the prompt file path (stored as ~/.multiclaude/prompts/.md) diff --git a/internal/cli/cli_test.go b/internal/cli/cli_test.go index e1fa7574..fce5d976 100644 --- a/internal/cli/cli_test.go +++ b/internal/cli/cli_test.go @@ -2531,4 +2531,165 @@ func TestFindRepoFromGitRemote(t *testing.T) { t.Errorf("findRepoFromGitRemote() = %q, want %q", repoName, "repo-b") } }) + + t.Run("matches upstream remote for fork workflow", func(t *testing.T) { + tmpDir := t.TempDir() + stateFile := filepath.Join(tmpDir, "state.json") + + // Create state tracking the upstream repo URL + st := state.New(stateFile) + st.AddRepo("upstream-repo", &state.Repository{ + GithubURL: "https://github.com/upstream-org/upstream-repo", + TmuxSession: "mc-upstream-repo", + Agents: make(map[string]state.Agent), + }) + + // Create a git repo simulating a fork: + // origin -> user's fork (doesn't match tracked repo) + // upstream -> original repo (matches tracked repo) + gitDir := filepath.Join(tmpDir, "git-test") + if err := os.MkdirAll(gitDir, 0755); err != nil { + t.Fatalf("failed to create git dir: %v", err) + } + if err := os.Chdir(gitDir); err != nil { + t.Fatalf("failed to chdir: %v", err) + } + + if err := exec.Command("git", "init").Run(); err != nil { + t.Fatalf("failed to init git: %v", err) + } + // origin points to user's fork (no match) + if err := exec.Command("git", "remote", "add", "origin", "git@github.com:myuser/upstream-repo.git").Run(); err != nil { + t.Fatalf("failed to add origin remote: %v", err) + } + // upstream points to tracked repo (should match) + if err := exec.Command("git", "remote", "add", "upstream", "git@github.com:upstream-org/upstream-repo.git").Run(); err != nil { + t.Fatalf("failed to add upstream remote: %v", err) + } + + cli := &CLI{ + paths: &config.Paths{ + StateFile: stateFile, + }, + } + + repoName, err := cli.findRepoFromGitRemote() + if err != nil { + t.Fatalf("findRepoFromGitRemote() error: %v", err) + } + if repoName != "upstream-repo" { + t.Errorf("findRepoFromGitRemote() = %q, want %q", repoName, "upstream-repo") + } + }) + + t.Run("origin takes priority over upstream", func(t *testing.T) { + tmpDir := t.TempDir() + stateFile := filepath.Join(tmpDir, "state.json") + + // Create state with two repos + st := state.New(stateFile) + st.AddRepo("origin-repo", &state.Repository{ + GithubURL: "https://github.com/user/origin-repo", + TmuxSession: "mc-origin-repo", + Agents: make(map[string]state.Agent), + }) + st.AddRepo("upstream-repo", &state.Repository{ + GithubURL: "https://github.com/org/upstream-repo", + TmuxSession: "mc-upstream-repo", + Agents: make(map[string]state.Agent), + }) + + // Create a git repo where both remotes match different tracked repos + gitDir := filepath.Join(tmpDir, "git-test") + if err := os.MkdirAll(gitDir, 0755); err != nil { + t.Fatalf("failed to create git dir: %v", err) + } + if err := os.Chdir(gitDir); err != nil { + t.Fatalf("failed to chdir: %v", err) + } + + if err := exec.Command("git", "init").Run(); err != nil { + t.Fatalf("failed to init git: %v", err) + } + if err := exec.Command("git", "remote", "add", "origin", "git@github.com:user/origin-repo.git").Run(); err != nil { + t.Fatalf("failed to add origin remote: %v", err) + } + if err := exec.Command("git", "remote", "add", "upstream", "git@github.com:org/upstream-repo.git").Run(); err != nil { + t.Fatalf("failed to add upstream remote: %v", err) + } + + cli := &CLI{ + paths: &config.Paths{ + StateFile: stateFile, + }, + } + + repoName, err := cli.findRepoFromGitRemote() + if err != nil { + t.Fatalf("findRepoFromGitRemote() error: %v", err) + } + // origin should be checked first and win + if repoName != "origin-repo" { + t.Errorf("findRepoFromGitRemote() = %q, want %q (origin should take priority)", repoName, "origin-repo") + } + }) +} + +func TestCreateWorkerRollbackOnFailure(t *testing.T) { + // This test verifies that when worker creation fails after the worktree + // has been created, the rollback logic cleans up the orphaned worktree. + // + // We achieve this by pointing the CLI at a bad daemon socket so that + // the daemon communication step (after worktree creation) fails, + // triggering the deferred rollback. + + os.Setenv("MULTICLAUDE_TEST_MODE", "1") + defer os.Unsetenv("MULTICLAUDE_TEST_MODE") + + tmpDir, err := os.MkdirTemp("", "cli-rollback-test-*") + if err != nil { + t.Fatalf("Failed to create temp dir: %v", err) + } + tmpDir, _ = filepath.EvalSymlinks(tmpDir) + defer os.RemoveAll(tmpDir) + + repoName := "test-repo" + + paths := &config.Paths{ + Root: tmpDir, + DaemonPID: filepath.Join(tmpDir, "daemon.pid"), + DaemonSock: filepath.Join(tmpDir, "nonexistent.sock"), // Bad socket to force failure + DaemonLog: filepath.Join(tmpDir, "daemon.log"), + StateFile: filepath.Join(tmpDir, "state.json"), + ReposDir: filepath.Join(tmpDir, "repos"), + WorktreesDir: filepath.Join(tmpDir, "wts"), + MessagesDir: filepath.Join(tmpDir, "messages"), + OutputDir: filepath.Join(tmpDir, "output"), + ClaudeConfigDir: filepath.Join(tmpDir, "claude-config"), + } + if err := paths.EnsureDirectories(); err != nil { + t.Fatalf("Failed to create directories: %v", err) + } + if err := os.MkdirAll(filepath.Join(tmpDir, "prompts"), 0755); err != nil { + t.Fatalf("Failed to create prompts dir: %v", err) + } + + // Set up a real git repo so worktree creation succeeds + repoPath := paths.RepoDir(repoName) + setupTestRepo(t, repoPath) + + cli := NewWithPaths(paths) + + // Attempt to create a worker — should fail at daemon communication + // but the worktree will have been created first + wtPath := paths.AgentWorktree(repoName, "rollback-test") + err = cli.Execute([]string{"work", "test rollback task", "--repo", repoName, "--name", "rollback-test"}) + if err == nil { + t.Fatal("Expected worker creation to fail with bad daemon socket") + } + + // Verify the worktree was cleaned up by the rollback + if _, statErr := os.Stat(wtPath); !os.IsNotExist(statErr) { + t.Errorf("Worktree at %s should have been removed by rollback, but still exists", wtPath) + } } diff --git a/internal/cli/selector.go b/internal/cli/selector.go index 4f442aed..a2ca4c23 100644 --- a/internal/cli/selector.go +++ b/internal/cli/selector.go @@ -7,6 +7,7 @@ import ( "strconv" "strings" + "github.com/dlorenc/multiclaude/internal/errors" "github.com/dlorenc/multiclaude/internal/format" ) @@ -21,7 +22,7 @@ type SelectableItem struct { // If there's only one item, it's auto-selected without prompting. func SelectFromList(prompt string, items []SelectableItem) (string, error) { if len(items) == 0 { - return "", fmt.Errorf("no items available") + return "", errors.NoItemsAvailable("") } // Auto-select if only one item @@ -63,7 +64,7 @@ func SelectFromList(prompt string, items []SelectableItem) (string, error) { reader := bufio.NewReader(os.Stdin) input, err := reader.ReadString('\n') if err != nil { - return "", fmt.Errorf("failed to read input: %w", err) + return "", errors.FailedToReadInput(err) } input = strings.TrimSpace(input) @@ -76,12 +77,12 @@ func SelectFromList(prompt string, items []SelectableItem) (string, error) { // Parse number num, err := strconv.Atoi(input) if err != nil { - return "", fmt.Errorf("invalid selection: %q is not a number", input) + return "", errors.InvalidSelection(input, len(items)) } // Validate range if num < 1 || num > len(items) { - return "", fmt.Errorf("invalid selection: %d is out of range (1-%d)", num, len(items)) + return "", errors.SelectionOutOfRange(num, len(items)) } return items[num-1].Name, nil diff --git a/internal/daemon/daemon.go b/internal/daemon/daemon.go index 462bd548..01a3b2aa 100644 --- a/internal/daemon/daemon.go +++ b/internal/daemon/daemon.go @@ -305,8 +305,31 @@ func (d *Daemon) checkAgentHealth() { } else { d.logger.Info("Successfully restarted agent %s", agentName) } + } else if agent.Type == state.AgentTypeWorker || agent.Type == state.AgentTypeReview { + // For workers/review agents, detect crash and notify supervisor + // Only notify once (when CrashedAt is not set) + if agent.CrashedAt.IsZero() { + d.logger.Info("Detected crashed worker %s, checking for uncommitted work", agentName) + + // Check for uncommitted work + hasUncommitted := false + if agent.WorktreePath != "" { + if uncommitted, err := worktree.HasUncommittedChanges(agent.WorktreePath); err == nil { + hasUncommitted = uncommitted + } else { + d.logger.Warn("Failed to check uncommitted changes for %s: %v", agentName, err) + } + } + + // Mark as crashed in state + if err := d.state.UpdateAgentCrashed(repoName, agentName, time.Now()); err != nil { + d.logger.Error("Failed to mark agent %s as crashed: %v", agentName, err) + } + + // Notify supervisor about the crash + d.notifySupervisorOfCrash(repoName, agentName, agent, hasUncommitted) + } } - // For transient agents (workers, review), don't auto-restart - they complete and clean up } } } @@ -419,11 +442,12 @@ func (d *Daemon) wakeLoop() { } } -// wakeAgents sends periodic nudges to agents +// wakeAgents sends periodic nudges to agents, but only when they have work to do func (d *Daemon) wakeAgents() { - d.logger.Debug("Waking agents") + d.logger.Debug("Waking agents (selective)") now := time.Now() + msgMgr := d.getMessageManager() // Get a snapshot of repos to avoid concurrent map access repos := d.state.GetAllRepos() @@ -439,6 +463,13 @@ func (d *Daemon) wakeAgents() { continue } + // Selective wakeup: only wake agents that have work to do + reason := d.agentHasWork(repoName, agentName, agent, repo, msgMgr) + if reason == "" { + d.logger.Debug("Skipping wake for %s/%s: no work detected", repoName, agentName) + continue + } + // Send wake message based on agent type var message string switch agent.Type { @@ -464,9 +495,56 @@ func (d *Daemon) wakeAgents() { d.logger.Error("Failed to update agent %s last nudge: %v", agentName, err) } - d.logger.Debug("Woke agent %s in repo %s", agentName, repoName) + d.logger.Debug("Woke agent %s in repo %s (reason: %s)", agentName, repoName, reason) + } + } +} + +// agentHasWork checks whether an agent has work that warrants a wake nudge. +// Returns a reason string if there's work, or empty string if no work detected. +func (d *Daemon) agentHasWork(repoName, agentName string, agent state.Agent, repo *state.Repository, msgMgr *messages.Manager) string { + // Any agent with pending messages has work + if msgMgr.HasPending(repoName, agentName) { + return "pending messages" + } + + switch agent.Type { + case state.AgentTypeSupervisor: + // Supervisor has work when there are active workers to monitor + for _, a := range repo.Agents { + if a.Type == state.AgentTypeWorker && !a.ReadyForCleanup { + return "active workers" + } + } + + case state.AgentTypeMergeQueue: + // Merge queue has work when there are workers with open PRs + for _, a := range repo.Agents { + if a.Type == state.AgentTypeWorker && a.PRURL != "" && !a.ReadyForCleanup { + return "open worker PRs" + } + } + // Also check task history for recent open PRs + history, err := d.state.GetTaskHistory(repoName, 10) + if err == nil { + for _, entry := range history { + if entry.Status == state.TaskStatusOpen && entry.PRURL != "" { + return "open PRs in history" + } + } } + + case state.AgentTypeWorker: + // Workers drive their own work - only wake for pending messages (checked above). + // No periodic nudge needed since workers are actively executing tasks. + return "" + + case state.AgentTypeReview: + // Review agents drive their own work - only wake for pending messages (checked above). + return "" } + + return "" } // worktreeRefreshLoop periodically syncs worker worktrees with main branch @@ -660,6 +738,9 @@ func (d *Daemon) handleRequest(req socket.Request) socket.Response { go d.routeMessages() return socket.Response{Success: true, Data: "Message routing triggered"} + case "refresh_worktrees": + return d.handleRefreshWorktrees(req) + case "task_history": return d.handleTaskHistory(req) @@ -925,11 +1006,19 @@ func (d *Daemon) handleListAgents(req socket.Request) socket.Response { status := "unknown" if agent.ReadyForCleanup { status = "completed" + } else if !agent.CrashedAt.IsZero() { + // Agent has been marked as crashed + status = "crashed" } else if repoExists { - // Check if window exists (means agent is running) + // Check if window exists hasWindow, err := d.tmux.HasWindow(d.ctx, repo.TmuxSession, agent.TmuxWindow) if err == nil && hasWindow { - status = "running" + // Window exists, but also check if process is alive + if agent.PID > 0 && !isProcessAlive(agent.PID) { + status = "crashed" + } else { + status = "running" + } } else { status = "stopped" } @@ -984,13 +1073,19 @@ func (d *Daemon) handleCompleteAgent(req socket.Request) socket.Response { // Mark as ready for cleanup agent.ReadyForCleanup = true - // Optional: capture summary and failure reason for task history + // Optional: capture summary, failure reason, and PR info for task history if summary, ok := req.Args["summary"].(string); ok && summary != "" { agent.Summary = summary } if failureReason, ok := req.Args["failure_reason"].(string); ok && failureReason != "" { agent.FailureReason = failureReason } + if prURL, ok := req.Args["pr_url"].(string); ok && prURL != "" { + agent.PRURL = prURL + } + if prNumber, ok := req.Args["pr_number"].(float64); ok && prNumber > 0 { + agent.PRNumber = int(prNumber) + } if err := d.state.UpdateAgent(repoName, agentName, agent); err != nil { return socket.Response{Success: false, Error: err.Error()} @@ -1042,6 +1137,36 @@ func (d *Daemon) handleCompleteAgent(req socket.Request) socket.Response { return socket.Response{Success: true} } +// notifySupervisorOfCrash sends a message to supervisor when a worker crashes +func (d *Daemon) notifySupervisorOfCrash(repoName, agentName string, agent state.Agent, hasUncommitted bool) { + msgMgr := d.getMessageManager() + + task := agent.Task + if task == "" { + task = "unknown task" + } + + var message string + if hasUncommitted { + message = fmt.Sprintf("CRASHED: Worker '%s' has crashed with uncommitted work. Task: %s. "+ + "The worktree is preserved. To restart: multiclaude agent restart %s --repo %s", + agentName, task, agentName, repoName) + } else { + message = fmt.Sprintf("CRASHED: Worker '%s' has crashed. Task: %s. "+ + "No uncommitted work detected. To restart: multiclaude agent restart %s --repo %s", + agentName, task, agentName, repoName) + } + + if _, err := msgMgr.Send(repoName, "daemon", "supervisor", message); err != nil { + d.logger.Error("Failed to send crash notification to supervisor: %v", err) + } else { + d.logger.Info("Sent crash notification to supervisor for worker %s", agentName) + } + + // Trigger immediate message delivery + go d.routeMessages() +} + // handleRestartAgent restarts an agent that has crashed or exited func (d *Daemon) handleRestartAgent(req socket.Request) socket.Response { repoName, errResp, ok := getRequiredStringArg(req.Args, "repo", "repository name is required") @@ -1335,7 +1460,7 @@ func (d *Daemon) cleanupDeadAgents(deadAgents map[string][]string) { d.logger.Error("Failed to remove agent %s/%s from state: %v", repoName, agentName, err) } - // Clean up worktree if it exists (workers and review agents have worktrees) + // Clean up worktree and branch if they exist (workers and review agents have worktrees) if agent.WorktreePath != "" && (agent.Type == state.AgentTypeWorker || agent.Type == state.AgentTypeReview) { repoPath := d.paths.RepoDir(repoName) wt := worktree.NewManager(repoPath) @@ -1344,6 +1469,24 @@ func (d *Daemon) cleanupDeadAgents(deadAgents map[string][]string) { } else { d.logger.Info("Removed worktree for dead agent: %s", agent.WorktreePath) } + + // Delete the branch (work/) after worktree removal + branchName := "work/" + agentName + if err := wt.DeleteBranch(branchName); err != nil { + d.logger.Warn("Failed to delete branch %s: %v", branchName, err) + } else { + d.logger.Info("Deleted branch for dead agent: %s", branchName) + } + } + + // Clean up per-agent Claude config directory + agentConfigDir := d.paths.AgentClaudeConfigDir(repoName, agentName) + if _, err := os.Stat(agentConfigDir); err == nil { + if err := os.RemoveAll(agentConfigDir); err != nil { + d.logger.Warn("Failed to remove agent config dir %s: %v", agentConfigDir, err) + } else { + d.logger.Info("Removed agent config dir: %s", agentConfigDir) + } } // Clean up message directory @@ -1369,17 +1512,24 @@ func (d *Daemon) recordTaskHistory(repoName, agentName string, agent state.Agent } } - // Determine initial status + // Determine initial status based on failure reason and PR info status := state.TaskStatusUnknown if agent.FailureReason != "" { status = state.TaskStatusFailed + } else if agent.PRURL != "" || agent.PRNumber > 0 { + // If a PR was created, mark as open (will be updated when queried) + status = state.TaskStatusOpen + } else { + status = state.TaskStatusNoPR } entry := state.TaskHistoryEntry{ Name: agentName, Task: agent.Task, Branch: branch, - Status: status, // Will be updated when displaying if a PR exists + PRURL: agent.PRURL, + PRNumber: agent.PRNumber, + Status: status, Summary: agent.Summary, FailureReason: agent.FailureReason, CreatedAt: agent.CreatedAt, @@ -1389,7 +1539,13 @@ func (d *Daemon) recordTaskHistory(repoName, agentName string, agent state.Agent if err := d.state.AddTaskHistory(repoName, entry); err != nil { d.logger.Warn("Failed to record task history for %s: %v", agentName, err) } else { - d.logger.Info("Recorded task history for %s (branch: %s, summary: %q)", agentName, branch, agent.Summary) + prInfo := "" + if agent.PRURL != "" { + prInfo = fmt.Sprintf(", pr: %s", agent.PRURL) + } else if agent.PRNumber > 0 { + prInfo = fmt.Sprintf(", pr: #%d", agent.PRNumber) + } + d.logger.Info("Recorded task history for %s (branch: %s, status: %s%s, summary: %q)", agentName, branch, status, prInfo, agent.Summary) } } @@ -1431,6 +1587,137 @@ func (d *Daemon) handleTaskHistory(req socket.Request) socket.Response { return socket.Response{Success: true, Data: result} } +// handleRefreshWorktrees triggers immediate worktree refresh for all repos +func (d *Daemon) handleRefreshWorktrees(req socket.Request) socket.Response { + d.logger.Info("Manual worktree refresh triggered") + + // Run refresh synchronously so we can report results + results := d.refreshWorktreesWithResults() + + return socket.Response{ + Success: true, + Data: results, + } +} + +// refreshWorktreesWithResults syncs worker worktrees and returns results +func (d *Daemon) refreshWorktreesWithResults() map[string]interface{} { + results := map[string]interface{}{ + "refreshed": []string{}, + "skipped": []map[string]string{}, + "errors": []map[string]string{}, + } + + refreshed := []string{} + skipped := []map[string]string{} + errors := []map[string]string{} + + repos := d.state.GetAllRepos() + for repoName, repo := range repos { + repoPath := d.paths.RepoDir(repoName) + + // Check if repo path exists + if _, err := os.Stat(repoPath); os.IsNotExist(err) { + continue + } + + wt := worktree.NewManager(repoPath) + + // Get the upstream remote and default branch + remote, err := wt.GetUpstreamRemote() + if err != nil { + d.logger.Debug("Could not get remote for %s: %v", repoName, err) + continue + } + + mainBranch, err := wt.GetDefaultBranch(remote) + if err != nil { + d.logger.Debug("Could not get default branch for %s: %v", repoName, err) + continue + } + + // Fetch from remote to have latest state + if err := wt.FetchRemote(remote); err != nil { + d.logger.Debug("Could not fetch from remote for %s: %v", repoName, err) + continue + } + + // Check each worker agent's worktree + for agentName, agent := range repo.Agents { + // Only refresh worker worktrees + if agent.Type != state.AgentTypeWorker { + continue + } + + // Skip if worktree path is empty + if agent.WorktreePath == "" { + continue + } + + // Check if worktree exists + if _, err := os.Stat(agent.WorktreePath); os.IsNotExist(err) { + continue + } + + // Check worktree state + wtState, err := worktree.GetWorktreeState(agent.WorktreePath, remote, mainBranch) + if err != nil { + d.logger.Debug("Could not get worktree state for %s/%s: %v", repoName, agentName, err) + continue + } + + // Skip if can't refresh + if !wtState.CanRefresh { + skipped = append(skipped, map[string]string{ + "agent": agentName, + "repo": repoName, + "reason": wtState.RefreshReason, + }) + continue + } + + // Refresh the worktree + d.logger.Info("Refreshing worktree for %s/%s (%d commits behind)", repoName, agentName, wtState.CommitsBehind) + result := worktree.RefreshWorktree(agent.WorktreePath, remote, mainBranch) + + if result.Error != nil { + errors = append(errors, map[string]string{ + "agent": agentName, + "repo": repoName, + "error": result.Error.Error(), + }) + if result.HasConflicts { + d.logger.Warn("Worktree refresh for %s/%s has conflicts in: %v", repoName, agentName, result.ConflictFiles) + } else { + d.logger.Error("Failed to refresh worktree for %s/%s: %v", repoName, agentName, result.Error) + } + } else if result.Skipped { + skipped = append(skipped, map[string]string{ + "agent": agentName, + "repo": repoName, + "reason": result.SkipReason, + }) + } else { + refreshed = append(refreshed, fmt.Sprintf("%s/%s", repoName, agentName)) + d.logger.Info("Refreshed worktree for %s/%s: rebased %d commits", repoName, agentName, result.CommitsRebased) + + // Notify the agent that their worktree was refreshed + msgMgr := d.getMessageManager() + msg := fmt.Sprintf("Your worktree has been synced with main (rebased %d commits). Run 'git log --oneline -5' to see recent changes.", result.CommitsRebased) + if _, err := msgMgr.Send(repoName, "daemon", agentName, msg); err != nil { + d.logger.Debug("Could not send refresh notification to %s/%s: %v", repoName, agentName, err) + } + } + } + } + + results["refreshed"] = refreshed + results["skipped"] = skipped + results["errors"] = errors + + return results +} + // cleanupOrphanedWorktrees removes worktree directories without git tracking func (d *Daemon) cleanupOrphanedWorktrees() { repoNames := d.state.ListRepos() @@ -1891,6 +2178,11 @@ func (d *Daemon) restartAgent(repoName, agentName string, agent state.Agent, rep d.logger.Warn("Failed to update agent PID: %v", err) } + // Clear crashed state since agent is now running + if err := d.state.ClearAgentCrashed(repoName, agentName); err != nil { + d.logger.Warn("Failed to clear agent crashed state: %v", err) + } + d.logger.Info("Restarted agent %s with PID %d (resumed=%v)", agentName, result.PID, hasHistory) return nil } diff --git a/internal/daemon/daemon_test.go b/internal/daemon/daemon_test.go index edd0d99f..96e047e7 100644 --- a/internal/daemon/daemon_test.go +++ b/internal/daemon/daemon_test.go @@ -181,6 +181,69 @@ func TestCleanupDeadAgents(t *testing.T) { } } +func TestCleanupDeadAgentsCleansConfigDir(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + // Add a test repository + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "test-session", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + // Add a test agent + agent := state.Agent{ + Type: state.AgentTypeWorker, + WorktreePath: "/tmp/nonexistent-worktree", // Fake path - worktree cleanup will warn but continue + TmuxWindow: "test-window", + SessionID: "test-session-id", + CreatedAt: time.Now(), + } + if err := d.state.AddAgent("test-repo", "test-agent", agent); err != nil { + t.Fatalf("Failed to add agent: %v", err) + } + + // Create the agent's Claude config directory + agentConfigDir := d.paths.AgentClaudeConfigDir("test-repo", "test-agent") + if err := os.MkdirAll(agentConfigDir, 0755); err != nil { + t.Fatalf("Failed to create agent config dir: %v", err) + } + + // Create a dummy file in the config dir + dummyFile := filepath.Join(agentConfigDir, "test.md") + if err := os.WriteFile(dummyFile, []byte("test content"), 0644); err != nil { + t.Fatalf("Failed to create dummy file: %v", err) + } + + // Verify config dir exists + if _, err := os.Stat(agentConfigDir); os.IsNotExist(err) { + t.Fatal("Agent config dir should exist before cleanup") + } + + // Mark agent as dead + deadAgents := map[string][]string{ + "test-repo": {"test-agent"}, + } + + // Call cleanup + d.cleanupDeadAgents(deadAgents) + + // Verify agent was removed from state + _, exists := d.state.GetAgent("test-repo", "test-agent") + if exists { + t.Error("Agent should not exist after cleanup") + } + + // Verify config dir was removed + if _, err := os.Stat(agentConfigDir); !os.IsNotExist(err) { + t.Error("Agent config dir should not exist after cleanup") + } +} + func TestHandleCompleteAgent(t *testing.T) { d, cleanup := setupTestDaemon(t) defer cleanup() @@ -1219,6 +1282,17 @@ func TestWakeLoopUpdatesNudgeTime(t *testing.T) { t.Fatalf("Failed to add agent: %v", err) } + // Add a worker so supervisor has work to do (selective wakeup requires it) + worker := state.Agent{ + Type: state.AgentTypeWorker, + TmuxWindow: "worker", + Task: "test task", + CreatedAt: time.Now(), + } + if err := d.state.AddAgent("test-repo", "worker-1", worker); err != nil { + t.Fatalf("Failed to add worker: %v", err) + } + // Trigger wake beforeWake := time.Now() d.TriggerWake() @@ -2765,3 +2839,269 @@ func TestHandleClearCurrentRepoWhenNone(t *testing.T) { t.Errorf("clear_current_repo should succeed even when no repo set: %s", resp.Error) } } + +func TestAgentHasWorkSupervisorWithActiveWorkers(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + // Add supervisor and an active worker + supervisor := state.Agent{Type: state.AgentTypeSupervisor, TmuxWindow: "supervisor", CreatedAt: time.Now()} + worker := state.Agent{Type: state.AgentTypeWorker, TmuxWindow: "worker", Task: "do stuff", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "supervisor", supervisor) + d.state.AddAgent("test-repo", "worker-1", worker) + + msgMgr := d.getMessageManager() + + // Re-fetch repo snapshot to include the worker + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "supervisor", supervisor, repoSnap, msgMgr) + if reason == "" { + t.Error("Supervisor should have work when active workers exist") + } + if reason != "active workers" { + t.Errorf("Expected reason 'active workers', got %q", reason) + } +} + +func TestAgentHasWorkSupervisorNoWorkers(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + supervisor := state.Agent{Type: state.AgentTypeSupervisor, TmuxWindow: "supervisor", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "supervisor", supervisor) + + msgMgr := d.getMessageManager() + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "supervisor", supervisor, repoSnap, msgMgr) + if reason != "" { + t.Errorf("Supervisor should have no work when no workers exist, got reason %q", reason) + } +} + +func TestAgentHasWorkMergeQueueWithOpenPRs(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + mq := state.Agent{Type: state.AgentTypeMergeQueue, TmuxWindow: "merge-queue", CreatedAt: time.Now()} + worker := state.Agent{Type: state.AgentTypeWorker, TmuxWindow: "worker", PRURL: "https://github.com/test/repo/pull/1", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "merge-queue", mq) + d.state.AddAgent("test-repo", "worker-1", worker) + + msgMgr := d.getMessageManager() + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "merge-queue", mq, repoSnap, msgMgr) + if reason == "" { + t.Error("Merge queue should have work when workers have open PRs") + } +} + +func TestAgentHasWorkMergeQueueNoPRs(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + mq := state.Agent{Type: state.AgentTypeMergeQueue, TmuxWindow: "merge-queue", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "merge-queue", mq) + + msgMgr := d.getMessageManager() + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "merge-queue", mq, repoSnap, msgMgr) + if reason != "" { + t.Errorf("Merge queue should have no work without PRs, got reason %q", reason) + } +} + +func TestAgentHasWorkWorkerNoMessages(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + worker := state.Agent{Type: state.AgentTypeWorker, TmuxWindow: "worker", Task: "do stuff", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "worker-1", worker) + + msgMgr := d.getMessageManager() + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "worker-1", worker, repoSnap, msgMgr) + if reason != "" { + t.Errorf("Worker should have no work without messages, got reason %q", reason) + } +} + +func TestAgentHasWorkWithPendingMessages(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + worker := state.Agent{Type: state.AgentTypeWorker, TmuxWindow: "worker", Task: "do stuff", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "worker-1", worker) + + // Send a message to the worker + msgMgr := d.getMessageManager() + _, err := msgMgr.Send("test-repo", "supervisor", "worker-1", "You have a task") + if err != nil { + t.Fatalf("Failed to send message: %v", err) + } + + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "worker-1", worker, repoSnap, msgMgr) + if reason != "pending messages" { + t.Errorf("Worker with pending messages should have work, got reason %q", reason) + } +} + +func TestAgentHasWorkMergeQueueWithHistoryPRs(t *testing.T) { + d, cleanup := setupTestDaemon(t) + defer cleanup() + + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + mq := state.Agent{Type: state.AgentTypeMergeQueue, TmuxWindow: "merge-queue", CreatedAt: time.Now()} + d.state.AddAgent("test-repo", "merge-queue", mq) + + // Add a task history entry with an open PR + entry := state.TaskHistoryEntry{ + Name: "old-worker", + Task: "some task", + Branch: "work/old-worker", + PRURL: "https://github.com/test/repo/pull/5", + PRNumber: 5, + Status: state.TaskStatusOpen, + CreatedAt: time.Now(), + CompletedAt: time.Now(), + } + d.state.AddTaskHistory("test-repo", entry) + + msgMgr := d.getMessageManager() + repos := d.state.GetAllRepos() + repoSnap := repos["test-repo"] + + reason := d.agentHasWork("test-repo", "merge-queue", mq, repoSnap, msgMgr) + if reason == "" { + t.Error("Merge queue should have work when task history has open PRs") + } + if reason != "open PRs in history" { + t.Errorf("Expected reason 'open PRs in history', got %q", reason) + } +} + +func TestSelectiveWakeSkipsWorkersWithoutWork(t *testing.T) { + tmuxClient := tmux.NewClient() + if !tmuxClient.IsTmuxAvailable() { + t.Skip("tmux not available") + } + + d, cleanup := setupTestDaemon(t) + defer cleanup() + + // Create a real tmux session + sessionName := "mc-test-selective-wake" + if err := tmuxClient.CreateSession(context.Background(), sessionName, true); err != nil { + t.Fatalf("Failed to create tmux session: %v", err) + } + defer tmuxClient.KillSession(context.Background(), sessionName) + + // Create windows for agents + if err := tmuxClient.CreateWindow(context.Background(), sessionName, "worker"); err != nil { + t.Fatalf("Failed to create worker window: %v", err) + } + + // Add repo and a worker with no messages (should NOT be woken) + repo := &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: sessionName, + Agents: make(map[string]state.Agent), + } + if err := d.state.AddRepo("test-repo", repo); err != nil { + t.Fatalf("Failed to add repo: %v", err) + } + + worker := state.Agent{ + Type: state.AgentTypeWorker, + TmuxWindow: "worker", + Task: "Test task", + CreatedAt: time.Now(), + LastNudge: time.Time{}, // Never nudged + } + if err := d.state.AddAgent("test-repo", "worker-1", worker); err != nil { + t.Fatalf("Failed to add agent: %v", err) + } + + // Trigger wake - worker has no messages, should NOT be woken + d.TriggerWake() + + // Verify LastNudge was NOT updated (worker skipped due to no work) + updatedAgent, _ := d.state.GetAgent("test-repo", "worker-1") + if !updatedAgent.LastNudge.IsZero() { + t.Error("Worker without work should NOT be woken - LastNudge should remain zero") + } +} diff --git a/internal/daemon/handlers_test.go b/internal/daemon/handlers_test.go index 9bfccf2e..813ecb7a 100644 --- a/internal/daemon/handlers_test.go +++ b/internal/daemon/handlers_test.go @@ -541,6 +541,49 @@ func TestHandleCompleteAgentTableDriven(t *testing.T) { } }, }, + { + name: "complete with PR info captures URL and number", + args: map[string]interface{}{ + "repo": "test-repo", + "agent": "worker-agent", + "summary": "Implemented feature X", + "pr_url": "https://github.com/test/repo/pull/42", + "pr_number": float64(42), + }, + setupState: func(s *state.State) { + s.AddRepo("test-repo", &state.Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "test-session", + Agents: make(map[string]state.Agent), + }) + s.AddAgent("test-repo", "worker-agent", state.Agent{ + Type: state.AgentTypeWorker, + TmuxWindow: "worker-window", + Task: "implement feature X", + CreatedAt: time.Now(), + }) + }, + wantSuccess: true, + checkState: func(t *testing.T, d *Daemon) { + agent, exists := d.state.GetAgent("test-repo", "worker-agent") + if !exists { + t.Error("Agent should still exist after complete") + return + } + if !agent.ReadyForCleanup { + t.Error("Agent should be marked as ready for cleanup") + } + if agent.Summary != "Implemented feature X" { + t.Errorf("Agent summary = %q, want %q", agent.Summary, "Implemented feature X") + } + if agent.PRURL != "https://github.com/test/repo/pull/42" { + t.Errorf("Agent PRURL = %q, want %q", agent.PRURL, "https://github.com/test/repo/pull/42") + } + if agent.PRNumber != 42 { + t.Errorf("Agent PRNumber = %d, want %d", agent.PRNumber, 42) + } + }, + }, } for _, tt := range tests { diff --git a/internal/errors/errors.go b/internal/errors/errors.go index 2560c585..207d0ead 100644 --- a/internal/errors/errors.go +++ b/internal/errors/errors.go @@ -307,16 +307,18 @@ func MissingArgument(argName, expectedType string) *CLIError { msg = fmt.Sprintf("missing required argument: %s (%s)", argName, expectedType) } return &CLIError{ - Category: CategoryUsage, - Message: msg, + Category: CategoryUsage, + Message: msg, + Suggestion: "multiclaude --help", } } // InvalidArgument creates an error for invalid argument values func InvalidArgument(argName, value, expected string) *CLIError { return &CLIError{ - Category: CategoryUsage, - Message: fmt.Sprintf("invalid value for '%s': got '%s', expected %s", argName, value, expected), + Category: CategoryUsage, + Message: fmt.Sprintf("invalid value for '%s': got '%s', expected %s", argName, value, expected), + Suggestion: "multiclaude --help", } } @@ -382,3 +384,122 @@ func WorkspaceNotFound(name, repo string) *CLIError { Suggestion: fmt.Sprintf("multiclaude workspace list --repo %s", repo), } } + +// InvalidWorkspaceName creates an error for invalid workspace names +func InvalidWorkspaceName(reason string) *CLIError { + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("invalid workspace name: %s", reason), + Suggestion: "workspace names follow git branch naming rules (no spaces, '..' or special characters)", + } +} + +// LogFileNotFound creates an error for when an agent's log file cannot be found +func LogFileNotFound(agent, repo string) *CLIError { + return &CLIError{ + Category: CategoryNotFound, + Message: fmt.Sprintf("no log file found for agent '%s' in repo '%s'", agent, repo), + Suggestion: fmt.Sprintf("check agent exists: multiclaude worker list --repo %s", repo), + } +} + +// AgentNotInState creates an error for when an agent is not found in state +func AgentNotInState(agent, repo string) *CLIError { + return &CLIError{ + Category: CategoryNotFound, + Message: fmt.Sprintf("agent '%s' not found in state for repo '%s'", agent, repo), + Suggestion: "the agent may have been removed; try recreating it", + } +} + +// NoSessionID creates an error for when an agent has no session ID +func NoSessionID(agent string) *CLIError { + return &CLIError{ + Category: CategoryConfig, + Message: fmt.Sprintf("agent '%s' has no session ID", agent), + Suggestion: "try removing and recreating the agent", + } +} + +// InvalidDuration creates an error for invalid duration strings +func InvalidDuration(value string) *CLIError { + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("invalid duration: %s", value), + Suggestion: "use format like '7d', '24h', or '30m' (days, hours, minutes)", + } +} + +// InvalidFlagValue creates an error for invalid flag values with allowed options +func InvalidFlagValue(flag, value string, allowed []string) *CLIError { + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("invalid value '%s' for %s", value, flag), + Suggestion: fmt.Sprintf("allowed values: %s", strings.Join(allowed, ", ")), + } +} + +// InvalidSelection creates an error for invalid user selection input +func InvalidSelection(input string, maxNum int) *CLIError { + if maxNum > 0 { + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("invalid selection: '%s'", input), + Suggestion: fmt.Sprintf("enter a number between 1 and %d, or press Enter to cancel", maxNum), + } + } + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("invalid selection: '%s'", input), + Suggestion: "enter a valid number from the list", + } +} + +// SelectionOutOfRange creates an error for selection numbers outside valid range +func SelectionOutOfRange(num, maxNum int) *CLIError { + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("selection %d is out of range", num), + Suggestion: fmt.Sprintf("enter a number between 1 and %d", maxNum), + } +} + +// NoItemsAvailable creates an error when a selection list is empty +func NoItemsAvailable(itemType string) *CLIError { + msg := "no items available for selection" + if itemType != "" { + msg = fmt.Sprintf("no %s available", itemType) + } + return &CLIError{ + Category: CategoryNotFound, + Message: msg, + } +} + +// WorkspaceAlreadyExists creates an error when a workspace name is already taken +func WorkspaceAlreadyExists(name, repo string) *CLIError { + return &CLIError{ + Category: CategoryRuntime, + Message: fmt.Sprintf("workspace '%s' already exists in repo '%s'", name, repo), + Suggestion: fmt.Sprintf("choose a different name, or remove existing workspace:\n multiclaude workspace rm %s --repo %s", name, repo), + } +} + +// InvalidRepoName creates an error for invalid or empty repository names +func InvalidRepoName(reason string) *CLIError { + return &CLIError{ + Category: CategoryUsage, + Message: fmt.Sprintf("invalid repository name: %s", reason), + Suggestion: "repository names must be non-empty and follow GitHub naming rules", + } +} + +// FailedToReadInput creates an error for input reading failures +func FailedToReadInput(cause error) *CLIError { + return &CLIError{ + Category: CategoryRuntime, + Message: "failed to read input", + Cause: cause, + Suggestion: "check terminal is interactive and try again", + } +} diff --git a/internal/errors/errors_test.go b/internal/errors/errors_test.go index 6c131e25..f041ad72 100644 --- a/internal/errors/errors_test.go +++ b/internal/errors/errors_test.go @@ -577,3 +577,135 @@ func TestWorkspaceNotFound(t *testing.T) { t.Errorf("expected workspace list suggestion, got: %s", formatted) } } + +func TestInvalidWorkspaceName(t *testing.T) { + err := InvalidWorkspaceName("cannot contain spaces") + + if err.Category != CategoryUsage { + t.Errorf("expected CategoryUsage, got %v", err.Category) + } + if err.Suggestion == "" { + t.Error("should have a suggestion") + } + + formatted := Format(err) + if !strings.Contains(formatted, "invalid workspace name") { + t.Errorf("expected 'invalid workspace name' in message, got: %s", formatted) + } + if !strings.Contains(formatted, "cannot contain spaces") { + t.Errorf("expected reason in message, got: %s", formatted) + } + if !strings.Contains(formatted, "branch naming") { + t.Errorf("expected branch naming hint in suggestion, got: %s", formatted) + } +} + +func TestLogFileNotFound(t *testing.T) { + err := LogFileNotFound("worker-1", "my-repo") + + if err.Category != CategoryNotFound { + t.Errorf("expected CategoryNotFound, got %v", err.Category) + } + if err.Suggestion == "" { + t.Error("should have a suggestion") + } + + formatted := Format(err) + if !strings.Contains(formatted, "worker-1") { + t.Errorf("expected agent name in message, got: %s", formatted) + } + if !strings.Contains(formatted, "my-repo") { + t.Errorf("expected repo name in message, got: %s", formatted) + } + if !strings.Contains(formatted, "multiclaude worker list") { + t.Errorf("expected worker list suggestion, got: %s", formatted) + } +} + +func TestAgentNotInState(t *testing.T) { + err := AgentNotInState("worker-1", "my-repo") + + if err.Category != CategoryNotFound { + t.Errorf("expected CategoryNotFound, got %v", err.Category) + } + if err.Suggestion == "" { + t.Error("should have a suggestion") + } + + formatted := Format(err) + if !strings.Contains(formatted, "worker-1") { + t.Errorf("expected agent name in message, got: %s", formatted) + } + if !strings.Contains(formatted, "my-repo") { + t.Errorf("expected repo name in message, got: %s", formatted) + } + if !strings.Contains(formatted, "recreating") { + t.Errorf("expected recreating hint in suggestion, got: %s", formatted) + } +} + +func TestNoSessionID(t *testing.T) { + err := NoSessionID("worker-1") + + if err.Category != CategoryConfig { + t.Errorf("expected CategoryConfig, got %v", err.Category) + } + if err.Suggestion == "" { + t.Error("should have a suggestion") + } + + formatted := Format(err) + if !strings.Contains(formatted, "worker-1") { + t.Errorf("expected agent name in message, got: %s", formatted) + } + if !strings.Contains(formatted, "session ID") { + t.Errorf("expected 'session ID' in message, got: %s", formatted) + } + if !strings.Contains(formatted, "recreating") { + t.Errorf("expected recreating hint in suggestion, got: %s", formatted) + } +} + +func TestInvalidDuration(t *testing.T) { + err := InvalidDuration("abc") + + if err.Category != CategoryUsage { + t.Errorf("expected CategoryUsage, got %v", err.Category) + } + if err.Suggestion == "" { + t.Error("should have a suggestion") + } + + formatted := Format(err) + if !strings.Contains(formatted, "invalid duration") { + t.Errorf("expected 'invalid duration' in message, got: %s", formatted) + } + if !strings.Contains(formatted, "abc") { + t.Errorf("expected value in message, got: %s", formatted) + } + if !strings.Contains(formatted, "7d") { + t.Errorf("expected example format in suggestion, got: %s", formatted) + } +} + +func TestMissingArgumentHasSuggestion(t *testing.T) { + err := MissingArgument("filename", "string") + + if err.Suggestion == "" { + t.Error("MissingArgument should have a suggestion") + } + if !strings.Contains(err.Suggestion, "--help") { + t.Errorf("expected --help in suggestion, got: %s", err.Suggestion) + } +} + +func TestInvalidArgumentHasSuggestion(t *testing.T) { + err := InvalidArgument("count", "abc", "integer") + + if err.Suggestion == "" { + t.Error("InvalidArgument should have a suggestion") + } + if !strings.Contains(err.Suggestion, "--help") { + t.Errorf("expected --help in suggestion, got: %s", err.Suggestion) + } +} diff --git a/internal/format/format.go b/internal/format/format.go index db06f4f2..3a8f7054 100644 --- a/internal/format/format.go +++ b/internal/format/format.go @@ -20,6 +20,7 @@ const ( StatusWarning Status = "warning" StatusError Status = "error" StatusPending Status = "pending" + StatusCrashed Status = "crashed" ) // Colors for different statuses @@ -39,7 +40,7 @@ func StatusColor(status Status) *color.Color { return Green case StatusWarning, StatusIdle, StatusPending: return Yellow - case StatusError: + case StatusError, StatusCrashed: return Red default: return color.New() @@ -59,6 +60,8 @@ func StatusIcon(status Status) string { return "⚠" case StatusError: return "✗" + case StatusCrashed: + return "!" case StatusPending: return "◦" default: diff --git a/internal/format/format_test.go b/internal/format/format_test.go index 0134cb19..73979f4a 100644 --- a/internal/format/format_test.go +++ b/internal/format/format_test.go @@ -18,6 +18,7 @@ func TestStatusColor(t *testing.T) { {StatusIdle, false}, {StatusPending, false}, {StatusError, false}, + {StatusCrashed, false}, {Status("unknown"), false}, } @@ -45,6 +46,7 @@ func TestStatusIcon(t *testing.T) { {StatusIdle, "○"}, {StatusWarning, "⚠"}, {StatusError, "✗"}, + {StatusCrashed, "!"}, {StatusPending, "◦"}, {Status("unknown"), "-"}, } diff --git a/internal/messages/messages.go b/internal/messages/messages.go index 8f475e41..07e9a410 100644 --- a/internal/messages/messages.go +++ b/internal/messages/messages.go @@ -207,6 +207,33 @@ func (m *Manager) read(repoName, agentName, filename string) (*Message, error) { return &msg, nil } +// HasPending returns true if the agent has any pending (undelivered) messages +func (m *Manager) HasPending(repoName, agentName string) bool { + dir := m.agentDir(repoName, agentName) + + entries, err := os.ReadDir(dir) + if err != nil { + return false + } + + for _, entry := range entries { + if entry.IsDir() || filepath.Ext(entry.Name()) != ".json" { + continue + } + + msg, err := m.read(repoName, agentName, entry.Name()) + if err != nil { + continue + } + + if msg.Status == StatusPending { + return true + } + } + + return false +} + // CleanupOrphaned removes message directories for non-existent agents func (m *Manager) CleanupOrphaned(repoName string, validAgents []string) (int, error) { repoDir := filepath.Join(m.messagesRoot, repoName) diff --git a/internal/messages/messages_test.go b/internal/messages/messages_test.go index 99bb63e7..eb31f59d 100644 --- a/internal/messages/messages_test.go +++ b/internal/messages/messages_test.go @@ -373,3 +373,37 @@ func TestCleanupOrphaned(t *testing.T) { } } } + +func TestHasPending(t *testing.T) { + tmpDir, err := os.MkdirTemp("", "messages-haspending-*") + if err != nil { + t.Fatalf("Failed to create temp dir: %v", err) + } + defer os.RemoveAll(tmpDir) + + mgr := NewManager(tmpDir) + repoName := "test-repo" + agentName := "worker-1" + + // No messages - should return false + if mgr.HasPending(repoName, agentName) { + t.Error("HasPending should return false when no messages exist") + } + + // Send a message - should return true + msg, err := mgr.Send(repoName, "supervisor", agentName, "Hello") + if err != nil { + t.Fatalf("Failed to send message: %v", err) + } + if !mgr.HasPending(repoName, agentName) { + t.Error("HasPending should return true when pending messages exist") + } + + // Mark as delivered - should return false (only pending counts) + if err := mgr.UpdateStatus(repoName, agentName, msg.ID, StatusDelivered); err != nil { + t.Fatalf("Failed to update status: %v", err) + } + if mgr.HasPending(repoName, agentName) { + t.Error("HasPending should return false when messages are delivered (not pending)") + } +} diff --git a/internal/prompts/commands/refresh.md b/internal/prompts/commands/refresh.md index 6c17d87b..86585831 100644 --- a/internal/prompts/commands/refresh.md +++ b/internal/prompts/commands/refresh.md @@ -4,34 +4,51 @@ Sync your worktree with the latest changes from the main branch. ## Instructions -1. First, fetch the latest changes: +1. First, determine the correct remote to use. Check if an upstream remote exists (indicates a fork): ```bash + git remote | grep -q upstream && echo "upstream" || echo "origin" + ``` + Use `upstream` if it exists (fork mode), otherwise use `origin`. + +2. Fetch the latest changes from the appropriate remote: + ```bash + # For forks (upstream remote exists): + git fetch upstream main + + # For non-forks (origin only): git fetch origin main ``` -2. Check if there are any uncommitted changes: +3. Check if there are any uncommitted changes: ```bash git status --porcelain ``` -3. If there are uncommitted changes, stash them first: +4. If there are uncommitted changes, stash them first: ```bash git stash push -m "refresh-stash-$(date +%s)" ``` -4. Rebase your current branch onto main: +5. Rebase your current branch onto main from the correct remote: ```bash + # For forks (upstream remote exists): + git rebase upstream/main + + # For non-forks (origin only): git rebase origin/main ``` -5. If you stashed changes, pop them: +6. If you stashed changes, pop them: ```bash git stash pop ``` -6. Report the result to the user, including: +7. Report the result to the user, including: + - Which remote was used (upstream or origin) - How many commits were rebased - Whether there were any conflicts - Current status after refresh If there are rebase conflicts, stop and let the user know which files have conflicts. + +**Note for forks:** When working in a fork, always rebase onto `upstream/main` (the original repo) to keep your work up to date with the latest upstream changes. diff --git a/internal/prompts/merge-queue.md b/internal/prompts/merge-queue.md index 38a8787c..fada8212 100644 --- a/internal/prompts/merge-queue.md +++ b/internal/prompts/merge-queue.md @@ -292,18 +292,27 @@ Every merge you make locks in progress. Every passing PR you process is a ratche ## Keeping Local Refs in Sync -After successfully merging a PR, always update the local main branch to stay in sync with origin: +After successfully merging a PR, always update local refs AND sync other agent worktrees: ```bash +# Update local main branch git fetch origin main:main + +# Sync all worker worktrees with main branch +multiclaude refresh ``` This is important because: - Workers branch off the local `main` ref when created - If local main is stale, new workers will start from old code - Stale refs cause unnecessary merge conflicts in future PRs +- Other workers may be working on stale code and need to be rebased -**Always run this command immediately after each successful merge.** This ensures the next worker created will start from the latest code. +**Always run both commands immediately after each successful merge.** The `multiclaude refresh` command: +- Fetches the latest main branch +- Rebases all worker worktrees that are behind main +- Sends notifications to affected agents +- Handles conflicts gracefully (aborts rebase and notifies if conflicts occur) ## PR Rejection Handling diff --git a/internal/state/state.go b/internal/state/state.go index 1c67ee73..0dd7a45c 100644 --- a/internal/state/state.go +++ b/internal/state/state.go @@ -89,9 +89,12 @@ type Agent struct { Task string `json:"task,omitempty"` // Only for workers Summary string `json:"summary,omitempty"` // Brief summary of work done (workers only) FailureReason string `json:"failure_reason,omitempty"` // Why the task failed (workers only) + PRURL string `json:"pr_url,omitempty"` // Pull request URL if created (workers only) + PRNumber int `json:"pr_number,omitempty"` // PR number for quick lookup (workers only) CreatedAt time.Time `json:"created_at"` LastNudge time.Time `json:"last_nudge,omitempty"` ReadyForCleanup bool `json:"ready_for_cleanup,omitempty"` // Only for workers + CrashedAt time.Time `json:"crashed_at,omitempty"` // When crash was detected (workers only) } // Repository represents a tracked repository's state @@ -342,6 +345,46 @@ func (s *State) UpdateAgentPID(repoName, agentName string, pid int) error { return s.saveUnlocked() } +// UpdateAgentCrashed marks an agent as crashed at the given time +func (s *State) UpdateAgentCrashed(repoName, agentName string, crashedAt time.Time) error { + s.mu.Lock() + defer s.mu.Unlock() + + repo, exists := s.Repos[repoName] + if !exists { + return fmt.Errorf("repository %q not found", repoName) + } + + agent, exists := repo.Agents[agentName] + if !exists { + return fmt.Errorf("agent %q not found in repository %q", agentName, repoName) + } + + agent.CrashedAt = crashedAt + repo.Agents[agentName] = agent + return s.saveUnlocked() +} + +// ClearAgentCrashed clears the crashed state for an agent (e.g., after restart) +func (s *State) ClearAgentCrashed(repoName, agentName string) error { + s.mu.Lock() + defer s.mu.Unlock() + + repo, exists := s.Repos[repoName] + if !exists { + return fmt.Errorf("repository %q not found", repoName) + } + + agent, exists := repo.Agents[agentName] + if !exists { + return fmt.Errorf("agent %q not found in repository %q", agentName, repoName) + } + + agent.CrashedAt = time.Time{} + repo.Agents[agentName] = agent + return s.saveUnlocked() +} + // RemoveAgent removes an agent from a repository func (s *State) RemoveAgent(repoName, agentName string) error { s.mu.Lock() diff --git a/internal/state/state_test.go b/internal/state/state_test.go index ee7af27c..0f234895 100644 --- a/internal/state/state_test.go +++ b/internal/state/state_test.go @@ -1285,3 +1285,159 @@ func TestTaskHistoryPersistence(t *testing.T) { t.Errorf("Loaded entry status = %q, want 'merged'", history[0].Status) } } + +func TestAgentCrashedState(t *testing.T) { + tmpDir := t.TempDir() + statePath := filepath.Join(tmpDir, "state.json") + + s := New(statePath) + repo := &Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]Agent), + } + if err := s.AddRepo("test-repo", repo); err != nil { + t.Fatalf("AddRepo() failed: %v", err) + } + + // Add a worker agent + agent := Agent{ + Type: AgentTypeWorker, + WorktreePath: "/path/to/worktree", + TmuxWindow: "worker-1", + Task: "test task", + PID: 12345, + CreatedAt: time.Now(), + } + if err := s.AddAgent("test-repo", "worker-1", agent); err != nil { + t.Fatalf("AddAgent() failed: %v", err) + } + + // Verify CrashedAt is initially zero + retrieved, exists := s.GetAgent("test-repo", "worker-1") + if !exists { + t.Fatal("Agent not found after adding") + } + if !retrieved.CrashedAt.IsZero() { + t.Error("CrashedAt should be zero initially") + } + + // Mark agent as crashed + crashTime := time.Now() + if err := s.UpdateAgentCrashed("test-repo", "worker-1", crashTime); err != nil { + t.Fatalf("UpdateAgentCrashed() failed: %v", err) + } + + // Verify crash time was set + retrieved, _ = s.GetAgent("test-repo", "worker-1") + if retrieved.CrashedAt.IsZero() { + t.Error("CrashedAt should be set after UpdateAgentCrashed") + } + if !retrieved.CrashedAt.Equal(crashTime) { + t.Errorf("CrashedAt = %v, want %v", retrieved.CrashedAt, crashTime) + } + + // Clear crashed state (simulating restart) + if err := s.ClearAgentCrashed("test-repo", "worker-1"); err != nil { + t.Fatalf("ClearAgentCrashed() failed: %v", err) + } + + // Verify crash time was cleared + retrieved, _ = s.GetAgent("test-repo", "worker-1") + if !retrieved.CrashedAt.IsZero() { + t.Error("CrashedAt should be zero after ClearAgentCrashed") + } +} + +func TestAgentCrashedStatePersistence(t *testing.T) { + tmpDir := t.TempDir() + statePath := filepath.Join(tmpDir, "state.json") + + // Create state and add crashed worker + s := New(statePath) + repo := &Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]Agent), + } + if err := s.AddRepo("test-repo", repo); err != nil { + t.Fatalf("AddRepo() failed: %v", err) + } + + agent := Agent{ + Type: AgentTypeWorker, + WorktreePath: "/path/to/worktree", + TmuxWindow: "worker-1", + Task: "test task", + PID: 12345, + CreatedAt: time.Now(), + } + if err := s.AddAgent("test-repo", "worker-1", agent); err != nil { + t.Fatalf("AddAgent() failed: %v", err) + } + + crashTime := time.Now() + if err := s.UpdateAgentCrashed("test-repo", "worker-1", crashTime); err != nil { + t.Fatalf("UpdateAgentCrashed() failed: %v", err) + } + + // Load state from disk + loaded, err := Load(statePath) + if err != nil { + t.Fatalf("Load() failed: %v", err) + } + + // Verify crash time persisted + retrieved, exists := loaded.GetAgent("test-repo", "worker-1") + if !exists { + t.Fatal("Agent not found after loading") + } + if retrieved.CrashedAt.IsZero() { + t.Error("CrashedAt should be persisted") + } + // Check time is approximately equal (JSON serialization may lose precision) + if retrieved.CrashedAt.Unix() != crashTime.Unix() { + t.Errorf("Loaded CrashedAt = %v, want %v", retrieved.CrashedAt, crashTime) + } +} + +func TestAgentCrashedStateErrors(t *testing.T) { + tmpDir := t.TempDir() + statePath := filepath.Join(tmpDir, "state.json") + + s := New(statePath) + + // Test UpdateAgentCrashed with non-existent repo + err := s.UpdateAgentCrashed("nonexistent", "agent", time.Now()) + if err == nil { + t.Error("UpdateAgentCrashed should fail for non-existent repo") + } + + // Test ClearAgentCrashed with non-existent repo + err = s.ClearAgentCrashed("nonexistent", "agent") + if err == nil { + t.Error("ClearAgentCrashed should fail for non-existent repo") + } + + // Add repo but no agent + repo := &Repository{ + GithubURL: "https://github.com/test/repo", + TmuxSession: "mc-test", + Agents: make(map[string]Agent), + } + if err := s.AddRepo("test-repo", repo); err != nil { + t.Fatalf("AddRepo() failed: %v", err) + } + + // Test UpdateAgentCrashed with non-existent agent + err = s.UpdateAgentCrashed("test-repo", "nonexistent", time.Now()) + if err == nil { + t.Error("UpdateAgentCrashed should fail for non-existent agent") + } + + // Test ClearAgentCrashed with non-existent agent + err = s.ClearAgentCrashed("test-repo", "nonexistent") + if err == nil { + t.Error("ClearAgentCrashed should fail for non-existent agent") + } +}