-
Notifications
You must be signed in to change notification settings - Fork 17
feat: AI Swarm Mode - Multi-agent parallel task execution #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add SwarmState, WorkerState, and related enums - Add SwarmConfig, RetryPolicy, ResourceLimits interfaces - Add SwarmRequest/SwarmResult/SwarmStatus types - Add Worker and WorkerTask types - Add SwarmMessage protocol types - Add Zod schemas for validation - Add SWARM_LIMITS and SWARM_TIMEOUTS constants - Add default configurations Part of browseros-ai#279
- Manage swarm lifecycle (create, update, delete) - Track workers per swarm with state management - Calculate progress and status summaries - Enforce concurrent swarm limits - Worker state transitions with timestamps Part of browseros-ai#279
- EventEmitter-based pub/sub messaging
- Channel naming: swarm:{id}:master, swarm:{id}:worker:{id}
- sendToWorker(), broadcast(), sendToMaster() helpers
- subscribe(), subscribeAll(), subscribeBroadcast()
- waitFor() with timeout for sync patterns
- Cleanup with removeSwarmListeners()
Part of browseros-ai#279
- spawnWorker() creates window via ControllerBridge - Health monitoring with heartbeat checks - Progress stale detection - handleWorkerFailure() with exponential backoff retry - terminateWorker() and terminateAllWorkers() - Cleanup methods for graceful shutdown Part of browseros-ai#279
- decompose() breaks complex tasks into parallel subtasks - estimateWorkerCount() for optimal worker sizing - Zod schema validation for LLM output - createManualTasks() fallback for non-LLM usage - Dependency handling between subtasks Part of browseros-ai#279
- aggregate() collects and merges worker results - Handles partial results from failed workers - calculateMetrics() for execution stats - Output formats: JSON, Markdown, HTML - Optional LLM synthesizer integration Part of browseros-ai#279
- createSwarm() initializes new swarm - executeSwarm() runs full lifecycle: 1. Planning (task decomposition) 2. Spawning (worker windows) 3. Executing (monitor progress) 4. Aggregating (merge results) - terminateSwarm() for graceful shutdown - Event-based progress reporting - Timeout handling and error recovery Part of browseros-ai#279
Endpoints: - POST /swarm - Create and execute swarm - POST /swarm/create - Create swarm only - POST /swarm/:id/execute - Execute existing swarm - GET /swarm/:id - Get status - GET /swarm/:id/stream - SSE for real-time updates - DELETE /swarm/:id - Terminate swarm Includes Zod validation and error handling. Part of browseros-ai#279
- Add comprehensive design doc with architecture overview - Export all swarm components from index.ts - Document API endpoints and message protocol - Track implementation status Part of browseros-ai#279
|
All contributors have signed the CLA. Thank you! |
|
I have read the CLA Document and I hereby sign the CLA |
Greptile OverviewGreptile SummaryImplements a comprehensive AI Swarm Mode feature that enables parallel task execution across multiple browser windows. The architecture is well-designed with clear separation of concerns:
Key Issues Found:
Strengths:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant API as Swarm API Routes
participant Coord as SwarmCoordinator
participant Registry as SwarmRegistry
participant Planner as TaskPlanner
participant Lifecycle as WorkerLifecycleManager
participant Bus as SwarmMessagingBus
participant Bridge as ControllerBridge
participant Workers as Worker Windows
participant Aggregator as ResultAggregator
Client->>API: POST /swarm {task, maxWorkers}
API->>Coord: createAndExecute(request)
Note over Coord: Phase 1: Planning
Coord->>Registry: create(task, config)
Registry-->>Coord: swarm object
Coord->>Registry: updateState(swarmId, 'planning')
Coord->>Planner: decompose(task, config)
Planner->>Planner: LLM generates subtasks
Planner-->>Coord: WorkerTask[]
Note over Coord: Phase 2: Spawning Workers
Coord->>Registry: updateState(swarmId, 'spawning')
loop For each task
Coord->>Lifecycle: spawnWorker(swarmId, task)
Lifecycle->>Registry: addWorker(swarmId, worker)
Lifecycle->>Bridge: sendRequest('create_window')
Bridge-->>Lifecycle: {windowId}
Lifecycle->>Bus: startHealthMonitoring()
Lifecycle-->>Coord: worker object
Coord->>Coord: emit('worker_spawned')
end
Note over Coord: Phase 3: Execution & Monitoring
Coord->>Registry: updateState(swarmId, 'executing')
Coord->>Coord: emit('swarm_started')
Coord->>Bus: subscribeToMaster(swarmId)
loop Worker Execution
Workers->>Bus: sendToMaster('task_progress')
Bus->>Coord: message received
Coord->>Registry: updateWorkerProgress()
Coord->>Coord: emit('worker_progress')
Workers->>Bus: sendToMaster('task_complete')
Bus->>Coord: message received
Coord->>Registry: setWorkerResult()
Coord->>Coord: emit('worker_completed')
end
Note over Coord: Phase 4: Aggregation
Coord->>Registry: updateState(swarmId, 'aggregating')
Coord->>Coord: emit('aggregation_started')
Coord->>Aggregator: aggregate(swarmId, format)
Aggregator->>Registry: getWorkers(swarmId)
Registry-->>Aggregator: Worker[]
Aggregator->>Aggregator: calculateMetrics()
Aggregator->>Aggregator: simpleAggregate() or LLM synthesize()
Aggregator-->>Coord: AggregatedResult
Coord->>Registry: updateState(swarmId, 'completed')
Coord->>Coord: emit('swarm_completed')
Coord-->>API: SwarmResult
API-->>Client: {success: true, data: result}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 2 comments
| // Keep alive | ||
| const keepAlive = setInterval(() => { | ||
| stream.writeSSE({ | ||
| event: 'ping', | ||
| data: JSON.stringify({ timestamp: Date.now() }), | ||
| }) | ||
| }, 30000) | ||
|
|
||
| // Wait for stream to close | ||
| await new Promise<void>((resolve) => { | ||
| stream.onAbort(() => { | ||
| clearInterval(keepAlive) | ||
| unsubscribe() | ||
| resolve() | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential resource leak - keepAlive interval may not be cleared if stream closes via the earlier terminal event path (lines 206-211)
When stream.close() is called in the terminal event handler, the keepAlive interval continues running. Consider clearing it before closing:
if (
event.type === 'swarm_completed' ||
event.type === 'swarm_failed'
) {
clearInterval(keepAlive)
stream.close()
}Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/server/src/api/routes/swarm.ts
Line: 219:234
Comment:
Potential resource leak - `keepAlive` interval may not be cleared if stream closes via the earlier terminal event path (lines 206-211)
When `stream.close()` is called in the terminal event handler, the `keepAlive` interval continues running. Consider clearing it before closing:
```typescript
if (
event.type === 'swarm_completed' ||
event.type === 'swarm_failed'
) {
clearInterval(keepAlive)
stream.close()
}
```
How can I resolve this? If you propose a fix, please make it concise.| // Increment retry count and respawn | ||
| worker.retryCount++ | ||
| worker.state = 'spawning' | ||
| worker.windowId = undefined | ||
| worker.error = undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retry logic mutates existing worker object, potentially causing issues when spawning new worker with same ID
When retrying, the code increments retryCount on the existing worker object then calls spawnWorker() which creates a fresh worker with retryCount: 0. The retry count won't persist correctly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/server/src/swarm/worker/worker-lifecycle.ts
Line: 247:251
Comment:
Retry logic mutates existing worker object, potentially causing issues when spawning new worker with same ID
When retrying, the code increments `retryCount` on the existing worker object then calls `spawnWorker()` which creates a fresh worker with retryCount: 0. The retry count won't persist correctly.
How can I resolve this? If you propose a fix, please make it concise.This commit adds production-ready advanced features to AI Swarm Mode: ## Scheduling & Load Balancing - PriorityTaskQueue: Priority scheduling with aging, deadline urgency, dependency resolution, and preemption support - LoadBalancer: 5 strategies (round-robin, least-connections, weighted, resource-aware, latency-based) with sticky sessions and health scoring ## Fault Tolerance (Resilience) - CircuitBreaker: Failure threshold monitoring, half-open recovery, fallback - Bulkhead: Concurrent execution limiting with queue - Utilities: retryWithBackoff(), withTimeout() ## Resource Pooling - WorkerPool: Pre-warmed workers for instant task assignment - Auto-scaling based on utilization - Idle timeout and maintenance loops ## Streaming Aggregation - StreamingAggregator: Real-time result streaming via async iterators - 4 aggregation modes: merge, concat, vote, custom - Conflict detection and resolution strategies ## Observability - SwarmTracer: OpenTelemetry-compatible distributed tracing - SwarmMetricsCollector: Time-series metrics with history - SwarmHealthChecker: Multi-check health status ## Worker Agent - SwarmWorkerAgent: LLM-powered execution planning - Browser automation via BrowserController interface - Heartbeat reporting, pause/resume, progress tracking ## Integration - SwarmService: Unified entry point integrating all components - Enhanced API routes with streaming, health, metrics, tracing endpoints - Server integration with optional swarm config Issue: browseros-ai#279
## Extension Side (controller-ext) - SwarmWindowManager: Manages worker windows for swarm mode - Create windows with cascading positions - Focus, minimize, close individual workers - Terminate entire swarm (close all windows) - Arrange windows (grid, cascade, tile layouts) - Capture screenshots from workers - Handle external window close events - SwarmActions: Chrome extension action handlers - createSwarmWindow, navigateSwarmWindow, focusSwarmWindow - closeSwarmWindow, terminateSwarm, arrangeSwarmWindows - getSwarmWindows, captureSwarmScreenshot, getSwarmStats - Registered all swarm actions in BrowserOSController ## Agent UI (React components) - SwarmPanel: Main visualization panel - Shows swarm status, progress, workers - Compact and expanded worker views - Window arrangement controls - Result preview and metrics display - SwarmWorkerCard: Individual worker status card - Visual status indicators (pending, executing, completed, failed) - Progress bar and duration tracking - Click to focus worker window - SwarmTrigger: Chat interface button - Enable/disable swarm mode - Configure max workers and priority - useSwarm hook: React state management - SSE streaming for real-time updates - API communication with server - Worker focus and termination Issue: browseros-ai#279
- Mark all extension and UI components as complete - Add file structure for controller-ext/actions/swarm - Add file structure for agent/components/swarm and lib/swarm - Update pending items to only remaining tasks Issue: browseros-ai#279
- SwarmTrigger: simple toggle button matching ChatModeToggle style
- SwarmPanel: compact inline progress bar (not complex Card)
- SwarmWorkerCard: minimal worker dots/indicators
- Use var(--accent-orange) instead of purple
- Use TooltipProvider delayDuration={0} consistently
- Removed heavy dependencies (Card, Badge, Collapsible)
- Add SwarmTrigger to ChatFooter (shows when in agent mode) - Add SwarmPanel above ChatFooter for progress visualization - Update Chat component with swarm state and handlers - Connect useSwarm hook to getAgentServerUrl() for API calls - Swarm toggle appears next to ChatModeToggle when in Agent mode - SSE streaming for real-time worker progress updates
- Pass swarm config to createHttpServer in main.ts - Enables SwarmService with all features: - enabled: true - maxWorkers: 10 - enablePooling: true - enableCircuitBreaker: true - enableTracing: true - loadBalancingStrategy: 'resource-aware' This enables the /swarm API endpoints in production.
1. Fix resource leak in SSE stream (swarm.ts) - Clear keepAlive interval before closing on terminal events - Call unsubscribe() to prevent memory leaks 2. Fix retry count persistence (worker-lifecycle.ts) - Preserve retryCount when respawning worker - New worker now inherits the incremented retry count
|
Thanks for the review! Both issues have been fixed in commit 1640b9e: 1. Resource leak in SSE stream (swarm.ts)
2. Retry count persistence (worker-lifecycle.ts)
Both fixes ensure proper cleanup and correct retry behavior. |
…n/status features
|
I have read the CLA Document and I hereby sign the CLA |
- Add 5s timeout to CDP connection to prevent server hanging - Make worker pool pre-warming non-blocking (background warmup) - Initialize SwarmService even without extension bridge connected - Remove unused imports
Fixes #279
Summary
Implements the foundation for AI Swarm Mode - enabling a master agent to spawn and orchestrate multiple worker agents in separate browser windows for parallel task execution.
Use Case
Components Added
Core (
apps/server/src/swarm/)types.tsconstants.tscoordinator/swarm-registry.tscoordinator/task-planner.tscoordinator/swarm-coordinator.tsworker/worker-lifecycle.tsmessaging/swarm-bus.tsaggregation/result-aggregator.tsAPI (
apps/server/src/api/routes/swarm.ts)Key Features
Commits
feat(swarm): add core types and constantsfeat(swarm): add SwarmRegistry for tracking active swarmsfeat(swarm): add SwarmMessagingBus for inter-agent communicationfeat(swarm): add WorkerLifecycleManager for worker managementfeat(swarm): add TaskPlanner for LLM-based task decompositionfeat(swarm): add ResultAggregator for merging worker resultsfeat(swarm): add SwarmCoordinator as main orchestratorfeat(swarm): add HTTP API routes for swarm managementdocs(swarm): add design document and update exportsNext Steps