feat: AI Swarm Mode - Multi-agent parallel task execution #280

Suhaib3100 · 2026-01-27T16:20:40Z

Fixes #279

Summary

Implements the foundation for AI Swarm Mode - enabling a master agent to spawn and orchestrate multiple worker agents in separate browser windows for parallel task execution.

Use Case

User: "Research and compare the top 5 CRM solutions"

Master Agent (Coordinator)
├── Decomposes task into 5 parallel subtasks
├── Spawns 5 worker windows
├── Monitors progress
└── Synthesizes final report
    │
    ├── Worker 1: Research Salesforce
    ├── Worker 2: Research HubSpot
    ├── Worker 3: Research Pipedrive
    ├── Worker 4: Research Zoho
    └── Worker 5: Research Monday

Components Added

Core (`apps/server/src/swarm/`)

Component	Description
`types.ts`	SwarmState, WorkerState, SwarmRequest, SwarmResult, message types
`constants.ts`	SWARM_LIMITS, SWARM_TIMEOUTS, default configs
`coordinator/swarm-registry.ts`	Tracks active swarms and workers
`coordinator/task-planner.ts`	LLM-based task decomposition
`coordinator/swarm-coordinator.ts`	Main orchestrator for swarm lifecycle
`worker/worker-lifecycle.ts`	Spawn, monitor, terminate workers
`messaging/swarm-bus.ts`	EventEmitter-based pub/sub communication
`aggregation/result-aggregator.ts`	Merges worker results

API (`apps/server/src/api/routes/swarm.ts`)

Method	Endpoint	Description
POST	/swarm	Create and execute swarm
POST	/swarm/create	Create swarm only
POST	/swarm/:id/execute	Execute existing swarm
GET	/swarm/:id	Get status
GET	/swarm/:id/stream	SSE for real-time updates
DELETE	/swarm/:id	Terminate swarm

Key Features

Task Decomposition: LLM automatically breaks complex tasks into parallel subtasks
Worker Management: Spawn workers in separate windows with health monitoring
Retry Logic: Exponential backoff for failed workers (max 3 retries)
Progress Tracking: Real-time updates via SSE streaming
Result Aggregation: Merge worker results with partial failure support
Output Formats: JSON, Markdown, or HTML

Commits

feat(swarm): add core types and constants
feat(swarm): add SwarmRegistry for tracking active swarms
feat(swarm): add SwarmMessagingBus for inter-agent communication
feat(swarm): add WorkerLifecycleManager for worker management
feat(swarm): add TaskPlanner for LLM-based task decomposition
feat(swarm): add ResultAggregator for merging worker results
feat(swarm): add SwarmCoordinator as main orchestrator
feat(swarm): add HTTP API routes for swarm management
docs(swarm): add design document and update exports

Next Steps

Integration with main server
Worker agent implementation
UI components for swarm visualization
Chromium-side SwarmWindowManager for resource isolation

- Add SwarmState, WorkerState, and related enums - Add SwarmConfig, RetryPolicy, ResourceLimits interfaces - Add SwarmRequest/SwarmResult/SwarmStatus types - Add Worker and WorkerTask types - Add SwarmMessage protocol types - Add Zod schemas for validation - Add SWARM_LIMITS and SWARM_TIMEOUTS constants - Add default configurations Part of browseros-ai#279

- Manage swarm lifecycle (create, update, delete) - Track workers per swarm with state management - Calculate progress and status summaries - Enforce concurrent swarm limits - Worker state transitions with timestamps Part of browseros-ai#279

- EventEmitter-based pub/sub messaging - Channel naming: swarm:{id}:master, swarm:{id}:worker:{id} - sendToWorker(), broadcast(), sendToMaster() helpers - subscribe(), subscribeAll(), subscribeBroadcast() - waitFor() with timeout for sync patterns - Cleanup with removeSwarmListeners() Part of browseros-ai#279

- spawnWorker() creates window via ControllerBridge - Health monitoring with heartbeat checks - Progress stale detection - handleWorkerFailure() with exponential backoff retry - terminateWorker() and terminateAllWorkers() - Cleanup methods for graceful shutdown Part of browseros-ai#279

- decompose() breaks complex tasks into parallel subtasks - estimateWorkerCount() for optimal worker sizing - Zod schema validation for LLM output - createManualTasks() fallback for non-LLM usage - Dependency handling between subtasks Part of browseros-ai#279

- aggregate() collects and merges worker results - Handles partial results from failed workers - calculateMetrics() for execution stats - Output formats: JSON, Markdown, HTML - Optional LLM synthesizer integration Part of browseros-ai#279

- createSwarm() initializes new swarm - executeSwarm() runs full lifecycle: 1. Planning (task decomposition) 2. Spawning (worker windows) 3. Executing (monitor progress) 4. Aggregating (merge results) - terminateSwarm() for graceful shutdown - Event-based progress reporting - Timeout handling and error recovery Part of browseros-ai#279

Endpoints: - POST /swarm - Create and execute swarm - POST /swarm/create - Create swarm only - POST /swarm/:id/execute - Execute existing swarm - GET /swarm/:id - Get status - GET /swarm/:id/stream - SSE for real-time updates - DELETE /swarm/:id - Terminate swarm Includes Zod validation and error handling. Part of browseros-ai#279

- Add comprehensive design doc with architecture overview - Export all swarm components from index.ts - Document API endpoints and message protocol - Track implementation status Part of browseros-ai#279

github-actions · 2026-01-27T16:20:53Z

All contributors have signed the CLA. Thank you!
_{Posted by the CLA Assistant Lite bot.}

Suhaib3100 · 2026-01-27T16:21:57Z

I have read the CLA Document and I hereby sign the CLA

greptile-apps · 2026-01-27T16:24:20Z

Greptile Overview

Greptile Summary

Implements a comprehensive AI Swarm Mode feature that enables parallel task execution across multiple browser windows. The architecture is well-designed with clear separation of concerns:

Core Components: SwarmRegistry tracks state, TaskPlanner decomposes tasks via LLM, WorkerLifecycleManager manages worker windows, SwarmMessagingBus handles inter-agent communication, and ResultAggregator merges results
API Layer: HTTP endpoints with SSE streaming for real-time progress updates
Type Safety: Comprehensive TypeScript types with Zod validation schemas

Key Issues Found:

Import Path Error (syntax): result-aggregator.ts has incorrect import path for SwarmRegistry
Retry Logic Bug (logic): Worker retry count is reset when respawning, breaking retry limit enforcement
Resource Leak (logic): SSE keepAlive interval not cleared on terminal events
Architecture Violation (style): index.ts barrel export violates CLAUDE.md guideline against bundling all exports

Strengths:

Clean architecture with proper dependency injection
Comprehensive error handling and logging
Well-documented with inline comments
Proper event-driven design for monitoring
Health monitoring with heartbeat and progress tracking

Confidence Score: 4/5

Safe to merge with minor fixes needed for import path and retry logic
Score of 4 reflects solid architecture and implementation with 2 syntax/logic bugs that need fixing: incorrect import path will cause runtime error, and retry logic bug could lead to infinite retries. Resource leak is less critical but should be addressed. Style violation (index.ts) doesn't block merge but should be cleaned up per project guidelines.
Pay close attention to result-aggregator.ts (import fix required) and worker-lifecycle.ts (retry logic fix required)

Important Files Changed

Filename	Overview
apps/server/src/swarm/worker/worker-lifecycle.ts	Worker lifecycle management with health monitoring; has retry logic bug where retryCount is reset on respawn
apps/server/src/swarm/aggregation/result-aggregator.ts	Result aggregation with formatting options; has incorrect import path for SwarmRegistry
apps/server/src/swarm/coordinator/swarm-coordinator.ts	Main orchestrator with proper phase management and event emission - no issues
apps/server/src/api/routes/swarm.ts	HTTP API routes with SSE streaming; potential resource leak in keepAlive interval cleanup
apps/server/src/swarm/index.ts	Barrel export file that violates CLAUDE.md guideline against index.ts exports

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as Swarm API Routes
    participant Coord as SwarmCoordinator
    participant Registry as SwarmRegistry
    participant Planner as TaskPlanner
    participant Lifecycle as WorkerLifecycleManager
    participant Bus as SwarmMessagingBus
    participant Bridge as ControllerBridge
    participant Workers as Worker Windows
    participant Aggregator as ResultAggregator

    Client->>API: POST /swarm {task, maxWorkers}
    API->>Coord: createAndExecute(request)
    
    Note over Coord: Phase 1: Planning
    Coord->>Registry: create(task, config)
    Registry-->>Coord: swarm object
    Coord->>Registry: updateState(swarmId, 'planning')
    Coord->>Planner: decompose(task, config)
    Planner->>Planner: LLM generates subtasks
    Planner-->>Coord: WorkerTask[]
    
    Note over Coord: Phase 2: Spawning Workers
    Coord->>Registry: updateState(swarmId, 'spawning')
    loop For each task
        Coord->>Lifecycle: spawnWorker(swarmId, task)
        Lifecycle->>Registry: addWorker(swarmId, worker)
        Lifecycle->>Bridge: sendRequest('create_window')
        Bridge-->>Lifecycle: {windowId}
        Lifecycle->>Bus: startHealthMonitoring()
        Lifecycle-->>Coord: worker object
        Coord->>Coord: emit('worker_spawned')
    end
    
    Note over Coord: Phase 3: Execution & Monitoring
    Coord->>Registry: updateState(swarmId, 'executing')
    Coord->>Coord: emit('swarm_started')
    Coord->>Bus: subscribeToMaster(swarmId)
    
    loop Worker Execution
        Workers->>Bus: sendToMaster('task_progress')
        Bus->>Coord: message received
        Coord->>Registry: updateWorkerProgress()
        Coord->>Coord: emit('worker_progress')
        
        Workers->>Bus: sendToMaster('task_complete')
        Bus->>Coord: message received
        Coord->>Registry: setWorkerResult()
        Coord->>Coord: emit('worker_completed')
    end
    
    Note over Coord: Phase 4: Aggregation
    Coord->>Registry: updateState(swarmId, 'aggregating')
    Coord->>Coord: emit('aggregation_started')
    Coord->>Aggregator: aggregate(swarmId, format)
    Aggregator->>Registry: getWorkers(swarmId)
    Registry-->>Aggregator: Worker[]
    Aggregator->>Aggregator: calculateMetrics()
    Aggregator->>Aggregator: simpleAggregate() or LLM synthesize()
    Aggregator-->>Coord: AggregatedResult
    
    Coord->>Registry: updateState(swarmId, 'completed')
    Coord->>Coord: emit('swarm_completed')
    Coord-->>API: SwarmResult
    API-->>Client: {success: true, data: result}

greptile-apps

_{4 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T16:24:23Z

apps/server/src/api/routes/swarm.ts

+      // Keep alive
+      const keepAlive = setInterval(() => {
+        stream.writeSSE({
+          event: 'ping',
+          data: JSON.stringify({ timestamp: Date.now() }),
+        })
+      }, 30000)
+
+      // Wait for stream to close
+      await new Promise<void>((resolve) => {
+        stream.onAbort(() => {
+          clearInterval(keepAlive)
+          unsubscribe()
+          resolve()
+        })
+      })


Potential resource leak - keepAlive interval may not be cleared if stream closes via the earlier terminal event path (lines 206-211)

When stream.close() is called in the terminal event handler, the keepAlive interval continues running. Consider clearing it before closing:

if ( event.type === 'swarm_completed' || event.type === 'swarm_failed' ) { clearInterval(keepAlive) stream.close() }

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/server/src/api/routes/swarm.ts Line: 219:234 Comment: Potential resource leak - `keepAlive` interval may not be cleared if stream closes via the earlier terminal event path (lines 206-211) When `stream.close()` is called in the terminal event handler, the `keepAlive` interval continues running. Consider clearing it before closing: ```typescript if ( event.type === 'swarm_completed' || event.type === 'swarm_failed' ) { clearInterval(keepAlive) stream.close() } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-27T16:24:24Z

apps/server/src/swarm/worker/worker-lifecycle.ts

+      // Increment retry count and respawn
+      worker.retryCount++
+      worker.state = 'spawning'
+      worker.windowId = undefined
+      worker.error = undefined


Retry logic mutates existing worker object, potentially causing issues when spawning new worker with same ID

When retrying, the code increments retryCount on the existing worker object then calls spawnWorker() which creates a fresh worker with retryCount: 0. The retry count won't persist correctly.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/server/src/swarm/worker/worker-lifecycle.ts Line: 247:251 Comment: Retry logic mutates existing worker object, potentially causing issues when spawning new worker with same ID When retrying, the code increments `retryCount` on the existing worker object then calls `spawnWorker()` which creates a fresh worker with retryCount: 0. The retry count won't persist correctly. How can I resolve this? If you propose a fix, please make it concise.

This commit adds production-ready advanced features to AI Swarm Mode: ## Scheduling & Load Balancing - PriorityTaskQueue: Priority scheduling with aging, deadline urgency, dependency resolution, and preemption support - LoadBalancer: 5 strategies (round-robin, least-connections, weighted, resource-aware, latency-based) with sticky sessions and health scoring ## Fault Tolerance (Resilience) - CircuitBreaker: Failure threshold monitoring, half-open recovery, fallback - Bulkhead: Concurrent execution limiting with queue - Utilities: retryWithBackoff(), withTimeout() ## Resource Pooling - WorkerPool: Pre-warmed workers for instant task assignment - Auto-scaling based on utilization - Idle timeout and maintenance loops ## Streaming Aggregation - StreamingAggregator: Real-time result streaming via async iterators - 4 aggregation modes: merge, concat, vote, custom - Conflict detection and resolution strategies ## Observability - SwarmTracer: OpenTelemetry-compatible distributed tracing - SwarmMetricsCollector: Time-series metrics with history - SwarmHealthChecker: Multi-check health status ## Worker Agent - SwarmWorkerAgent: LLM-powered execution planning - Browser automation via BrowserController interface - Heartbeat reporting, pause/resume, progress tracking ## Integration - SwarmService: Unified entry point integrating all components - Enhanced API routes with streaming, health, metrics, tracing endpoints - Server integration with optional swarm config Issue: browseros-ai#279

## Extension Side (controller-ext) - SwarmWindowManager: Manages worker windows for swarm mode - Create windows with cascading positions - Focus, minimize, close individual workers - Terminate entire swarm (close all windows) - Arrange windows (grid, cascade, tile layouts) - Capture screenshots from workers - Handle external window close events - SwarmActions: Chrome extension action handlers - createSwarmWindow, navigateSwarmWindow, focusSwarmWindow - closeSwarmWindow, terminateSwarm, arrangeSwarmWindows - getSwarmWindows, captureSwarmScreenshot, getSwarmStats - Registered all swarm actions in BrowserOSController ## Agent UI (React components) - SwarmPanel: Main visualization panel - Shows swarm status, progress, workers - Compact and expanded worker views - Window arrangement controls - Result preview and metrics display - SwarmWorkerCard: Individual worker status card - Visual status indicators (pending, executing, completed, failed) - Progress bar and duration tracking - Click to focus worker window - SwarmTrigger: Chat interface button - Enable/disable swarm mode - Configure max workers and priority - useSwarm hook: React state management - SSE streaming for real-time updates - API communication with server - Worker focus and termination Issue: browseros-ai#279

- Mark all extension and UI components as complete - Add file structure for controller-ext/actions/swarm - Add file structure for agent/components/swarm and lib/swarm - Update pending items to only remaining tasks Issue: browseros-ai#279

- SwarmTrigger: simple toggle button matching ChatModeToggle style - SwarmPanel: compact inline progress bar (not complex Card) - SwarmWorkerCard: minimal worker dots/indicators - Use var(--accent-orange) instead of purple - Use TooltipProvider delayDuration={0} consistently - Removed heavy dependencies (Card, Badge, Collapsible)

- Add SwarmTrigger to ChatFooter (shows when in agent mode) - Add SwarmPanel above ChatFooter for progress visualization - Update Chat component with swarm state and handlers - Connect useSwarm hook to getAgentServerUrl() for API calls - Swarm toggle appears next to ChatModeToggle when in Agent mode - SSE streaming for real-time worker progress updates

- Pass swarm config to createHttpServer in main.ts - Enables SwarmService with all features: - enabled: true - maxWorkers: 10 - enablePooling: true - enableCircuitBreaker: true - enableTracing: true - loadBalancingStrategy: 'resource-aware' This enables the /swarm API endpoints in production.

1. Fix resource leak in SSE stream (swarm.ts) - Clear keepAlive interval before closing on terminal events - Call unsubscribe() to prevent memory leaks 2. Fix retry count persistence (worker-lifecycle.ts) - Preserve retryCount when respawning worker - New worker now inherits the incremented retry count

Suhaib3100 · 2026-01-27T17:43:50Z

Thanks for the review! Both issues have been fixed in commit 1640b9e:

1. Resource leak in SSE stream (swarm.ts)

Now calling clearInterval(keepAlive) and unsubscribe() before stream.close() on terminal events

2. Retry count persistence (worker-lifecycle.ts)

Storing the incremented retry count in a variable before respawning
Applying the preserved retryCount to the new worker after spawn

Both fixes ensure proper cleanup and correct retry behavior.

…n/status features

Suhaib3100 · 2026-01-27T18:11:44Z

I have read the CLA Document and I hereby sign the CLA

- Add 5s timeout to CDP connection to prevent server hanging - Make worker pool pre-warming non-blocking (background warmup) - Initialize SwarmService even without extension bridge connected - Remove unused imports

Suhaib3100 added 9 commits January 27, 2026 21:45

docs(swarm): add design document and update exports

27c70b7

- Add comprehensive design doc with architecture overview - Export all swarm components from index.ts - Document API endpoints and message protocol - Track implementation status Part of browseros-ai#279

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

Suhaib3100 added 7 commits January 27, 2026 22:06

merge: resolve conflicts with origin/main - combine swarm and shutdow…

fdf69f3

…n/status features

Suhaib3100 added 2 commits January 28, 2026 00:10

fix: improve swarm startup robustness

ccb6bc2

- Add 5s timeout to CDP connection to prevent server hanging - Make worker pool pre-warming non-blocking (background warmup) - Initialize SwarmService even without extension bridge connected - Remove unused imports

feat(swarm): fix SSE streaming, remove duplicate requests, improve UI

5569206

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AI Swarm Mode - Multi-agent parallel task execution #280

feat: AI Swarm Mode - Multi-agent parallel task execution #280

Suhaib3100 commented Jan 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

Suhaib3100 commented Jan 27, 2026

Uh oh!

greptile-apps bot commented Jan 27, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 27, 2026

Uh oh!

greptile-apps bot Jan 27, 2026

Uh oh!

Suhaib3100 commented Jan 27, 2026

Uh oh!

Suhaib3100 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: AI Swarm Mode - Multi-agent parallel task execution #280

Are you sure you want to change the base?

feat: AI Swarm Mode - Multi-agent parallel task execution #280

Conversation

Suhaib3100 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Use Case

Components Added

Core (apps/server/src/swarm/)

API (apps/server/src/api/routes/swarm.ts)

Key Features

Commits

Next Steps

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Suhaib3100 commented Jan 27, 2026

Uh oh!

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Suhaib3100 commented Jan 27, 2026

Uh oh!

Suhaib3100 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Suhaib3100 commented Jan 27, 2026 •

edited

Loading

Core (`apps/server/src/swarm/`)

API (`apps/server/src/api/routes/swarm.ts`)

github-actions bot commented Jan 27, 2026 •

edited

Loading