WIP: feat: base multi-agent collaboration framework & shared context#423
WIP: feat: base multi-agent collaboration framework & shared context#423Leeaandrob wants to merge 38 commits intosipeed:mainfrom
Conversation
Merge feat/multi-agent-routing into framework branch. Resolve conflicts: - types.go: keep protocoltypes aliases + add FailoverError/ModelConfig - loop.go: take registry-based AgentLoop rewrite from PR sipeed#131
Implements the base multi-agent collaboration framework: Phase 1 - Config extension: - Add Role and SystemPrompt fields to AgentConfig - Wire them into AgentInstance for per-agent identity Phase 2 - Blackboard shared context pool: - pkg/multiagent/blackboard.go: thread-safe key-value store with authorship tracking, scope metadata, and JSON serialization - pkg/multiagent/blackboard_tool.go: LLM tool for read/write/list/delete Phase 3 - Handoff mechanism and agent discovery: - pkg/multiagent/handoff.go: ExecuteHandoff delegates tasks between agents via RunToolLoop, injecting blackboard context - pkg/multiagent/handoff_tool.go: LLM tool with dynamic agent listing - pkg/multiagent/list_agents_tool.go: discovery tool for LLM Phase 4 - AgentLoop integration: - registryResolver adapter bridges AgentRegistry to multiagent.AgentResolver - blackboard/handoff/list_agents tools registered for multi-agent configs - Per-session blackboard via sync.Map, snapshot injected into system prompt - Handoff tool context propagation for channel/chatID Design decisions: - String values only (natural language agent communication) - Scope field defaults to "shared", extensible for sipeed#119 identity model - Author field tracks which agent wrote (maps to future S-id) - Multi-agent tools only registered when >1 agent configured (zero overhead) - ~2.3KB per session memory budget Closes: sipeed#294
Mermaid-based C4 model documentation covering: - C1 System Context: picoclaw in its ecosystem - C2 Container: runtime containers and responsibilities - C3 Component: multi-agent internals (blackboard, handoff, routing, fallback, provider protocol) - C4 Code Detail: interfaces, tool loop flow, blackboard lifecycle, fallback decision tree - Sequence Diagrams: handoff, blackboard sync, fallback chain, route resolution, config lifecycle - Roadmap: phased plan with dependency graph and status tracking Relates to sipeed#294, sipeed#283
…ackages Fixes identified by running golangci-lint v2.10.1 (PR sipeed#304 config) with govet, staticcheck, errcheck, revive, gosec enabled: - Replace interface{} with any (revive: use-any) - Replace WriteString(fmt.Sprintf(...)) with fmt.Fprintf (staticcheck: QF1012) - Add doc comments on all exported methods (revive: exported) - Safe type assertions with ok-check pattern (errcheck/revive) - Use strings.EqualFold instead of double ToLower (staticcheck: SA6005) - Add default cases to switch statements (revive) - Suppress gosec G117 false positive on SessionKey field - Test improvements: range int, unused params Zero issues remaining in pkg/multiagent, pkg/agent, pkg/routing. All 47 tests pass. Relates to sipeed#304, sipeed#294
- Add multi-agent hardening PRP with 4-phase plan based on OpenClaw gap analysis (foundation fix, tool policy, resilience, async) - Update roadmap with hardening phases and dependency graph - Update C3 component diagram with planned components and known bugs - Add 5 new sequence diagrams for planned features (guardrails, tool policy, loop detection, async spawn, cascade stop) - Document blackboard split-brain bug with fix approach - Add OpenClaw comparison table and reference map
Add Capabilities []string to the full agent chain (config → instance → resolver → multiagent) as required by issue sipeed#294 spec: "Define a standard Agent interface that includes Capabilities." - AgentConfig: new capabilities JSON field - AgentInstance: new Capabilities field, populated from config - registryResolver: map Capabilities in GetAgentInfo and ListAgents - AgentInfo: new Capabilities field - FindAgentsByCapability: helper to query agents by capability tag - 4 new tests covering match, multi-match, empty, and nil safety
The LLM can now delegate tasks by capability instead of requiring a specific agent_id. The handoff tool resolves the first matching agent via FindAgentsByCapability. - HandoffTool.Execute: accept "capability" as alternative to "agent_id" - HandoffTool.Description: display agent capabilities in tool listing - HandoffTool.Parameters: "task" is the only required field now - 4 new tests (route by capability, not found, no target, description)
feat: add Capabilities field for capability-based agent routing (sipeed#294)
Tools were bound to a static board at registration time while the system prompt injected data from a separate per-session board. Add BoardAware interface and SetBoard() methods so tools receive the correct session blackboard before each execution cycle.
Prevent infinite handoff loops (A->B->A) and unbounded depth chains. HandoffRequest now carries Depth/Visited/MaxDepth fields that are propagated to target agents. Default max depth is 3.
Add AllowlistChecker interface and wire it into HandoffTool to control which agents can delegate to which. Default behavior is open (allow all) when no subagents config is present; enforces allow_agents list when configured via CanSpawnSubagent.
Introduce ToolHook with BeforeExecute/AfterExecute lifecycle methods in ToolRegistry. Hooks enable policy enforcement and loop detection pipelines without modifying individual tools. BeforeExecute can block execution; AfterExecute always runs for observability.
Cover SetBoard switching, BoardAware interface compliance, depth limit, cycle detection, depth propagation, allowlist block/permit/default-open, AllowlistCheckerFunc adapter, and ToolHook before/after/block/chain behavior.
Add boundary tests for depth limit, self-handoff cycle detection, provider error propagation, JSON unmarshal invalid data, empty blackboard list, hook observability on block, and no-hook/not-found tool guard paths.
Define DefaultToolGroups mapping group references (e.g. "group:fs", "group:web") to concrete tool names, and ResolveToolNames to expand group refs into deduplicated tool name lists. This is the foundation for per-agent tool policies in Phase 2.
Add ToolPolicyConfig struct with Allow/Deny string slices supporting both individual tool names and group refs (e.g. "group:web"). Add ToolPolicy field to AgentConfig. Backward compatible: nil = full access.
New policy.go with ApplyPolicy (allow/deny filtering) and DepthDenyList (leaf agents lose spawn/handoff/list_agents). Add Clone() for shallow registry copy and Remove() for tool unregistration to ToolRegistry.
Apply per-agent tool policy (allow/deny) at startup in registerSharedTools so denied tools are removed before the LLM sees them. Apply depth-based policy in ExecuteHandoff: at max depth, leaf agents lose spawn/handoff/ list_agents to prevent further chaining. Clone is lightweight — shares tool instances, only copies the map.
- groups_test.go: 6 tests for ResolveToolNames (group expansion, individual, mixed, dedup, unknown group, empty) - policy_test.go: 11 tests for ApplyPolicy, DepthDenyList, Clone, Remove, and policy composition pipeline - handoff_test.go: 2 depth policy tests (leaf loses spawn/handoff, mid-chain retains all tools)
The retry loop only handled context/token errors, letting 429s and rate-limit responses fail immediately. Add detection for 429, rate_limit, resource_exhausted, overloaded, quota, and too_many_requests errors with exponential backoff (5s, 10s, 20s). Works regardless of whether fallback candidates are configured.
Implements LoopDetector as a ToolHook with four detection engines: - Generic repeat: blocks after N identical tool+args calls (default 20) - Ping-pong: detects A,B,A,B alternation with no-progress evidence - No-progress: tracks result hashes to distinguish stuck from progressing - Circuit breaker: emergency stop for any tool with identical outcomes Per-session isolation via context key, sliding window (default 30), configurable thresholds matching OpenClaw production values.
- Inject session key into context before LLM iteration loop - Register LoopDetector as ToolHook per agent in registerSharedTools - Uses production defaults: warn@10, block@20, circuit breaker@30
Coverage for all four detection engines: - Generic repeat: below/at/above warning and critical thresholds - Ping-pong: alternation with and without progress evidence - Circuit breaker: no-progress streak and progress-resets-streak - Session isolation, reset, sliding window eviction - Registry integration, context key, hash determinism
RunRegistry tracks active handoff/spawn runs with parent-child relationships. CascadeStop recursively cancels a run and all its descendants with cycle-safe seen-set protection. Supports: Register, Deregister, CascadeStop, StopAll, GetChildren.
- HandoffTool wraps context with cancel, registers run in RunRegistry - Deregisters on completion (normal or error) via defer - Propagates ParentRunKey to nested handoffs for correct tree structure - AgentLoop creates shared RunRegistry, passes to all HandoffTools
Coverage for: single run, parent-child chain, mid-chain stop, multiple siblings, cycle protection (A→B→A), non-existent key, StopAll, GetChildren, Go context tree propagation.
Replace single-strategy context overflow handling with a tiered cascade: - Tier 1: Truncate oversized tool results (>8000 chars) with newline-aware boundary detection. Cheapest recovery — no messages dropped. - Tier 2: Drop oldest 50% of messages (existing forceCompression). - Tier 3: Both truncation + compression for maximum space reclamation. Each retry escalates to the next tier, giving the LLM progressively more aggressive context reduction before giving up.
Cover 5 scenarios: oversized result truncated with footer, small result untouched, non-tool messages preserved, newline boundary preference, and multiple tool results in single session.
AuthRotator manages round-robin selection across multiple API keys per provider. Each key tracks its own cooldown state via CooldownTracker (2-track: transient 1min→1hr exponential, billing 5h→24h). AuthRotatingProvider wraps multiple LLM providers and delegates to the best available key on each request. On retriable failures, the failing key is put in cooldown and subsequent requests use the next available key.
Add api_keys array to ProviderConfig for multi-key support. ResolveAPIKeys() returns api_keys if set, otherwise wraps api_key. Factory detects multiple keys and creates AuthRotatingProvider. Backward compatible: single api_key works unchanged.
12 tests covering: round-robin selection, cooldown skipping, all-in-cooldown, success reset, available count, profile builder, rotating provider success rotation, failure marking, all-cooldown error, single key, concurrent access, and billing vs standard cooldown durations.
Async spawn (fire-and-forget) with per-parent semaphore-based concurrency limiting and buffered announcement channels for result delivery. Includes: - SpawnManager: goroutine-based async agent invocation with configurable per-parent concurrency limits and timeouts - Announcer: per-session buffered channels with back-pressure (drops oldest on overflow) for spawn result delivery - SpawnTool: LLM-callable tool for async agent spawning with allowlist enforcement and capability-based routing - Config: MaxChildrenPerAgent + SpawnTimeoutSec in SubagentsConfig
ProcessScope tracks PIDs per session key for namespace-like isolation. Agents can only see and kill their own processes. - Register/Deregister/Owns for session-PID binding - ListPIDs filters dead processes via Unix signal 0 - KillAll sends SIGTERM to all session-owned processes - Cleanup removes all tracking for a session
DedupCache provides idempotent execution guarantees for spawn and announce operations using deterministic keys with TTL-based expiry and periodic background sweep. - Check/CheckWithResult for duplicate detection - BuildSpawnKey: deterministic hash of (from, to, task) - BuildAnnounceKey: deterministic key for announcement dedup - Configurable TTL with automatic expired entry cleanup
Integrates Phase 4 components into the core agent orchestrator: - AgentLoop creates Announcer and SpawnManager at startup - SpawnTool registered alongside existing multi-agent tools - Spawn announcements drained between LLM iterations and injected as system messages for LLM context - Tool contexts wired for spawn_agent (board + channel) - Config-driven spawn limits (MaxChildrenPerAgent, SpawnTimeoutSec)
27 tests covering Phase 4 components: Spawn (10): accepted, concurrency limit, cascade stop, context timeout, parallel fan-out, announcer deliver/drain, pending, back-pressure, concurrent delivery, cleanup Dedup (10): first call, duplicate, different keys, expired entry, check with result, size, concurrent access (100 goroutines), sweep, spawn key deterministic, announce key format Process scope (7): register/owns, deregister, cross-session isolation, list PIDs filters dead, kill all, cleanup, empty session
nikolasdehor
left a comment
There was a problem hiding this comment.
This is an impressive and well-structured multi-agent framework. The architecture is sound (blackboard pattern, AgentResolver interface to avoid circular imports, depth-based recursion guards). I have some concerns that should be addressed before this leaves WIP:
Correctness / Safety
-
Goroutine leak in AsyncSpawn: If the parent context is cancelled before the spawned goroutine completes, ExecuteHandoff may block indefinitely if the provider does not respect context cancellation. The context.WithTimeout mitigates this, but if the underlying HTTP call ignores the context (some providers do), the goroutine leaks. Consider adding a select with a force-kill timer as a last resort.
-
Semaphore count() race in error message: In AsyncSpawn, the rejection error message calls sem.count() after acquire() returned false. At that point, other goroutines may have released slots, so the count is misleading. Minor, but the error message implies the count is accurate when it is not.
-
HandoffTool state mutation is not thread-safe: ExecuteHandoff mutates ht.depth, ht.visited, ht.maxDepth directly on the tool. If two handoffs are executing concurrently using the same tool instance (which is possible with spawns), they will race on these fields. These should be passed through context or cloned per-invocation.
-
LoopDetector session state unbounded growth: The sessions sync.Map grows without bound -- one entry per session, never cleaned up. Long-running gateways will accumulate stale sessions. Add a TTL-based eviction or hook into session cleanup.
Design
-
Auth rotation belongs in a separate PR: pkg/providers/auth_rotation.go (185 lines + 343 lines of tests) is unrelated to multi-agent collaboration. Mixing it here makes the PR harder to review and bisect.
-
Excessive external references in comments: Comments like 'inspired by NVIDIA CUDA stream scheduling', 'Google MapReduce uses similar fan-out caps', 'Microsoft Azure Functions uses similar timeout patterns' add noise without value. The patterns speak for themselves -- the comments should explain why the code does what it does, not draw parallels to unrelated systems.
-
tools.DepthDenyList approach: Stripping tools at max depth is clever for preventing infinite chaining, but it changes the agent capability set mid-conversation. This could confuse the LLM if it planned to use a tool that was available in previous turns but is now gone. Consider returning a clear error from the tool instead of removing it entirely.
Testing
- Good test coverage overall (28 tests for blackboard, cycle detection, depth limits). The mockProvider approach is clean. Would like to see a test for the concurrent mutation issue in point 3.
Overall: solid foundation, but needs the thread-safety fix in point 3 and the session leak in point 4 before it is ready for merge. The auth rotation should be split out.
📝 Description
WIP — Base multi-agent collaboration framework with shared context pool, agent handoff, and discovery tools.
Builds on top of the merged PRs #213 (provider protocol refactor) and #131 (model fallback chain + multi-agent routing) to add:
pkg/multiagent/blackboard.go) for inter-agent data sharing with scoped entries (author, scope, timestamp)ExecuteHandoff()with automatic context propagation through the blackboardRunToolLooppkg/multiagentfrompkg/agentto avoid circular importssync.Map, tools auto-registered when >1 agent configuredRoleandSystemPromptfields onAgentConfigandAgentInstanceArchitecture decisions
len(registry.ListAgentIDs()) > 1)registryResolveradapter bridgesAgentRegistry→multiagent.AgentResolverat the integration boundary🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
Closes #294
📚 Technical Context (Skip for Docs)
🧪 Test Environment
📸 Evidence (Optional)
Click to view test results
Unit tests: 17 packages, all PASS (including
pkg/multiagentwith 28 tests)Integration tests: Claude CLI (3/3 PASS), Codex CLI (3/3 PASS)
Lint:
go vet ./...clean,go build ./...cleanNew test files:
pkg/multiagent/blackboard_test.go— 18 tests (CRUD, concurrency, JSON roundtrip, tool actions)pkg/multiagent/handoff_test.go— 10 tests (handoff execution, tool behavior, agent discovery)☑️ Checklist