Skip to content

WIP: feat: base multi-agent collaboration framework & shared context#423

Draft
Leeaandrob wants to merge 38 commits intosipeed:mainfrom
Leeaandrob:feat/multi-agent-framework
Draft

WIP: feat: base multi-agent collaboration framework & shared context#423
Leeaandrob wants to merge 38 commits intosipeed:mainfrom
Leeaandrob:feat/multi-agent-framework

Conversation

@Leeaandrob
Copy link
Collaborator

📝 Description

WIP — Base multi-agent collaboration framework with shared context pool, agent handoff, and discovery tools.

Builds on top of the merged PRs #213 (provider protocol refactor) and #131 (model fallback chain + multi-agent routing) to add:

  • Blackboard — Thread-safe shared key-value context pool (pkg/multiagent/blackboard.go) for inter-agent data sharing with scoped entries (author, scope, timestamp)
  • BlackboardTool — LLM-callable tool for read/write/list/delete on the shared context
  • Agent Handoff — Synchronous delegation between agents via ExecuteHandoff() with automatic context propagation through the blackboard
  • HandoffTool — LLM-callable tool that resolves target agent, writes context, builds system prompt, and delegates via RunToolLoop
  • ListAgentsTool — Discovery tool listing all registered agents with ID/Name/Role
  • AgentResolver interface — Decouples pkg/multiagent from pkg/agent to avoid circular imports
  • AgentLoop integration — Blackboard snapshot injection into system messages, per-session blackboards via sync.Map, tools auto-registered when >1 agent configured
  • Config extensionsRole and SystemPrompt fields on AgentConfig and AgentInstance

Architecture decisions

  • Zero overhead for single-agent setups (multi-agent tools only registered when len(registry.ListAgentIDs()) > 1)
  • registryResolver adapter bridges AgentRegistrymultiagent.AgentResolver at the integration boundary
  • Blackboard entries carry authorship metadata for audit/debugging

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Closes #294

📚 Technical Context (Skip for Docs)

🧪 Test Environment

  • Hardware: Linux x86_64
  • OS: Ubuntu (kernel 6.11.0-29-generic)
  • Model/Provider: Claude CLI (claude-code), Codex CLI (codex-code) — both integration tested
  • Channels: CLI direct mode

📸 Evidence (Optional)

Click to view test results

Unit tests: 17 packages, all PASS (including pkg/multiagent with 28 tests)
Integration tests: Claude CLI (3/3 PASS), Codex CLI (3/3 PASS)
Lint: go vet ./... clean, go build ./... clean

New test files:

  • pkg/multiagent/blackboard_test.go — 18 tests (CRUD, concurrency, JSON roundtrip, tool actions)
  • pkg/multiagent/handoff_test.go — 10 tests (handoff execution, tool behavior, agent discovery)

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Merge feat/multi-agent-routing into framework branch.
Resolve conflicts:
- types.go: keep protocoltypes aliases + add FailoverError/ModelConfig
- loop.go: take registry-based AgentLoop rewrite from PR sipeed#131
Implements the base multi-agent collaboration framework:

Phase 1 - Config extension:
- Add Role and SystemPrompt fields to AgentConfig
- Wire them into AgentInstance for per-agent identity

Phase 2 - Blackboard shared context pool:
- pkg/multiagent/blackboard.go: thread-safe key-value store with
  authorship tracking, scope metadata, and JSON serialization
- pkg/multiagent/blackboard_tool.go: LLM tool for read/write/list/delete

Phase 3 - Handoff mechanism and agent discovery:
- pkg/multiagent/handoff.go: ExecuteHandoff delegates tasks between
  agents via RunToolLoop, injecting blackboard context
- pkg/multiagent/handoff_tool.go: LLM tool with dynamic agent listing
- pkg/multiagent/list_agents_tool.go: discovery tool for LLM

Phase 4 - AgentLoop integration:
- registryResolver adapter bridges AgentRegistry to multiagent.AgentResolver
- blackboard/handoff/list_agents tools registered for multi-agent configs
- Per-session blackboard via sync.Map, snapshot injected into system prompt
- Handoff tool context propagation for channel/chatID

Design decisions:
- String values only (natural language agent communication)
- Scope field defaults to "shared", extensible for sipeed#119 identity model
- Author field tracks which agent wrote (maps to future S-id)
- Multi-agent tools only registered when >1 agent configured (zero overhead)
- ~2.3KB per session memory budget

Closes: sipeed#294
@Leeaandrob Leeaandrob added the type: enhancement New feature or request label Feb 18, 2026
Mermaid-based C4 model documentation covering:
- C1 System Context: picoclaw in its ecosystem
- C2 Container: runtime containers and responsibilities
- C3 Component: multi-agent internals (blackboard, handoff, routing, fallback, provider protocol)
- C4 Code Detail: interfaces, tool loop flow, blackboard lifecycle, fallback decision tree
- Sequence Diagrams: handoff, blackboard sync, fallback chain, route resolution, config lifecycle
- Roadmap: phased plan with dependency graph and status tracking

Relates to sipeed#294, sipeed#283
…ackages

Fixes identified by running golangci-lint v2.10.1 (PR sipeed#304 config) with
govet, staticcheck, errcheck, revive, gosec enabled:

- Replace interface{} with any (revive: use-any)
- Replace WriteString(fmt.Sprintf(...)) with fmt.Fprintf (staticcheck: QF1012)
- Add doc comments on all exported methods (revive: exported)
- Safe type assertions with ok-check pattern (errcheck/revive)
- Use strings.EqualFold instead of double ToLower (staticcheck: SA6005)
- Add default cases to switch statements (revive)
- Suppress gosec G117 false positive on SessionKey field
- Test improvements: range int, unused params

Zero issues remaining in pkg/multiagent, pkg/agent, pkg/routing.
All 47 tests pass.

Relates to sipeed#304, sipeed#294
Leeaandrob and others added 23 commits February 18, 2026 13:50
- Add multi-agent hardening PRP with 4-phase plan based on OpenClaw
  gap analysis (foundation fix, tool policy, resilience, async)
- Update roadmap with hardening phases and dependency graph
- Update C3 component diagram with planned components and known bugs
- Add 5 new sequence diagrams for planned features (guardrails,
  tool policy, loop detection, async spawn, cascade stop)
- Document blackboard split-brain bug with fix approach
- Add OpenClaw comparison table and reference map
Add Capabilities []string to the full agent chain (config → instance →
resolver → multiagent) as required by issue sipeed#294 spec:
"Define a standard Agent interface that includes Capabilities."

- AgentConfig: new capabilities JSON field
- AgentInstance: new Capabilities field, populated from config
- registryResolver: map Capabilities in GetAgentInfo and ListAgents
- AgentInfo: new Capabilities field
- FindAgentsByCapability: helper to query agents by capability tag
- 4 new tests covering match, multi-match, empty, and nil safety
The LLM can now delegate tasks by capability instead of requiring a
specific agent_id. The handoff tool resolves the first matching agent
via FindAgentsByCapability.

- HandoffTool.Execute: accept "capability" as alternative to "agent_id"
- HandoffTool.Description: display agent capabilities in tool listing
- HandoffTool.Parameters: "task" is the only required field now
- 4 new tests (route by capability, not found, no target, description)
feat: add Capabilities field for capability-based agent routing (sipeed#294)
Tools were bound to a static board at registration time while the
system prompt injected data from a separate per-session board.
Add BoardAware interface and SetBoard() methods so tools receive
the correct session blackboard before each execution cycle.
Prevent infinite handoff loops (A->B->A) and unbounded depth chains.
HandoffRequest now carries Depth/Visited/MaxDepth fields that are
propagated to target agents. Default max depth is 3.
Add AllowlistChecker interface and wire it into HandoffTool to control
which agents can delegate to which. Default behavior is open (allow all)
when no subagents config is present; enforces allow_agents list when
configured via CanSpawnSubagent.
Introduce ToolHook with BeforeExecute/AfterExecute lifecycle methods in
ToolRegistry. Hooks enable policy enforcement and loop detection
pipelines without modifying individual tools. BeforeExecute can block
execution; AfterExecute always runs for observability.
Cover SetBoard switching, BoardAware interface compliance, depth limit,
cycle detection, depth propagation, allowlist block/permit/default-open,
AllowlistCheckerFunc adapter, and ToolHook before/after/block/chain
behavior.
Add boundary tests for depth limit, self-handoff cycle detection,
provider error propagation, JSON unmarshal invalid data, empty
blackboard list, hook observability on block, and no-hook/not-found
tool guard paths.
Define DefaultToolGroups mapping group references (e.g. "group:fs",
"group:web") to concrete tool names, and ResolveToolNames to expand
group refs into deduplicated tool name lists. This is the foundation
for per-agent tool policies in Phase 2.
Add ToolPolicyConfig struct with Allow/Deny string slices supporting
both individual tool names and group refs (e.g. "group:web"). Add
ToolPolicy field to AgentConfig. Backward compatible: nil = full access.
New policy.go with ApplyPolicy (allow/deny filtering) and DepthDenyList
(leaf agents lose spawn/handoff/list_agents). Add Clone() for shallow
registry copy and Remove() for tool unregistration to ToolRegistry.
Apply per-agent tool policy (allow/deny) at startup in registerSharedTools
so denied tools are removed before the LLM sees them. Apply depth-based
policy in ExecuteHandoff: at max depth, leaf agents lose spawn/handoff/
list_agents to prevent further chaining. Clone is lightweight — shares
tool instances, only copies the map.
- groups_test.go: 6 tests for ResolveToolNames (group expansion,
  individual, mixed, dedup, unknown group, empty)
- policy_test.go: 11 tests for ApplyPolicy, DepthDenyList, Clone,
  Remove, and policy composition pipeline
- handoff_test.go: 2 depth policy tests (leaf loses spawn/handoff,
  mid-chain retains all tools)
The retry loop only handled context/token errors, letting 429s and
rate-limit responses fail immediately. Add detection for 429, rate_limit,
resource_exhausted, overloaded, quota, and too_many_requests errors
with exponential backoff (5s, 10s, 20s). Works regardless of whether
fallback candidates are configured.
Implements LoopDetector as a ToolHook with four detection engines:
- Generic repeat: blocks after N identical tool+args calls (default 20)
- Ping-pong: detects A,B,A,B alternation with no-progress evidence
- No-progress: tracks result hashes to distinguish stuck from progressing
- Circuit breaker: emergency stop for any tool with identical outcomes

Per-session isolation via context key, sliding window (default 30),
configurable thresholds matching OpenClaw production values.
- Inject session key into context before LLM iteration loop
- Register LoopDetector as ToolHook per agent in registerSharedTools
- Uses production defaults: warn@10, block@20, circuit breaker@30
Coverage for all four detection engines:
- Generic repeat: below/at/above warning and critical thresholds
- Ping-pong: alternation with and without progress evidence
- Circuit breaker: no-progress streak and progress-resets-streak
- Session isolation, reset, sliding window eviction
- Registry integration, context key, hash determinism
RunRegistry tracks active handoff/spawn runs with parent-child
relationships. CascadeStop recursively cancels a run and all its
descendants with cycle-safe seen-set protection.

Supports: Register, Deregister, CascadeStop, StopAll, GetChildren.
- HandoffTool wraps context with cancel, registers run in RunRegistry
- Deregisters on completion (normal or error) via defer
- Propagates ParentRunKey to nested handoffs for correct tree structure
- AgentLoop creates shared RunRegistry, passes to all HandoffTools
Coverage for: single run, parent-child chain, mid-chain stop,
multiple siblings, cycle protection (A→B→A), non-existent key,
StopAll, GetChildren, Go context tree propagation.
Replace single-strategy context overflow handling with a tiered cascade:
- Tier 1: Truncate oversized tool results (>8000 chars) with newline-aware
  boundary detection. Cheapest recovery — no messages dropped.
- Tier 2: Drop oldest 50% of messages (existing forceCompression).
- Tier 3: Both truncation + compression for maximum space reclamation.

Each retry escalates to the next tier, giving the LLM progressively more
aggressive context reduction before giving up.
Cover 5 scenarios: oversized result truncated with footer, small result
untouched, non-tool messages preserved, newline boundary preference,
and multiple tool results in single session.
AuthRotator manages round-robin selection across multiple API keys
per provider. Each key tracks its own cooldown state via CooldownTracker
(2-track: transient 1min→1hr exponential, billing 5h→24h).

AuthRotatingProvider wraps multiple LLM providers and delegates to
the best available key on each request. On retriable failures, the
failing key is put in cooldown and subsequent requests use the next
available key.
Add api_keys array to ProviderConfig for multi-key support.
ResolveAPIKeys() returns api_keys if set, otherwise wraps api_key.
Factory detects multiple keys and creates AuthRotatingProvider.
Backward compatible: single api_key works unchanged.
12 tests covering: round-robin selection, cooldown skipping, all-in-cooldown,
success reset, available count, profile builder, rotating provider success
rotation, failure marking, all-cooldown error, single key, concurrent
access, and billing vs standard cooldown durations.
Async spawn (fire-and-forget) with per-parent semaphore-based
concurrency limiting and buffered announcement channels for
result delivery. Includes:

- SpawnManager: goroutine-based async agent invocation with
  configurable per-parent concurrency limits and timeouts
- Announcer: per-session buffered channels with back-pressure
  (drops oldest on overflow) for spawn result delivery
- SpawnTool: LLM-callable tool for async agent spawning with
  allowlist enforcement and capability-based routing
- Config: MaxChildrenPerAgent + SpawnTimeoutSec in SubagentsConfig
ProcessScope tracks PIDs per session key for namespace-like
isolation. Agents can only see and kill their own processes.

- Register/Deregister/Owns for session-PID binding
- ListPIDs filters dead processes via Unix signal 0
- KillAll sends SIGTERM to all session-owned processes
- Cleanup removes all tracking for a session
DedupCache provides idempotent execution guarantees for spawn
and announce operations using deterministic keys with TTL-based
expiry and periodic background sweep.

- Check/CheckWithResult for duplicate detection
- BuildSpawnKey: deterministic hash of (from, to, task)
- BuildAnnounceKey: deterministic key for announcement dedup
- Configurable TTL with automatic expired entry cleanup
Integrates Phase 4 components into the core agent orchestrator:

- AgentLoop creates Announcer and SpawnManager at startup
- SpawnTool registered alongside existing multi-agent tools
- Spawn announcements drained between LLM iterations and
  injected as system messages for LLM context
- Tool contexts wired for spawn_agent (board + channel)
- Config-driven spawn limits (MaxChildrenPerAgent, SpawnTimeoutSec)
27 tests covering Phase 4 components:

Spawn (10): accepted, concurrency limit, cascade stop, context
timeout, parallel fan-out, announcer deliver/drain, pending,
back-pressure, concurrent delivery, cleanup

Dedup (10): first call, duplicate, different keys, expired entry,
check with result, size, concurrent access (100 goroutines),
sweep, spawn key deterministic, announce key format

Process scope (7): register/owns, deregister, cross-session
isolation, list PIDs filters dead, kill all, cleanup, empty session
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impressive and well-structured multi-agent framework. The architecture is sound (blackboard pattern, AgentResolver interface to avoid circular imports, depth-based recursion guards). I have some concerns that should be addressed before this leaves WIP:

Correctness / Safety

  1. Goroutine leak in AsyncSpawn: If the parent context is cancelled before the spawned goroutine completes, ExecuteHandoff may block indefinitely if the provider does not respect context cancellation. The context.WithTimeout mitigates this, but if the underlying HTTP call ignores the context (some providers do), the goroutine leaks. Consider adding a select with a force-kill timer as a last resort.

  2. Semaphore count() race in error message: In AsyncSpawn, the rejection error message calls sem.count() after acquire() returned false. At that point, other goroutines may have released slots, so the count is misleading. Minor, but the error message implies the count is accurate when it is not.

  3. HandoffTool state mutation is not thread-safe: ExecuteHandoff mutates ht.depth, ht.visited, ht.maxDepth directly on the tool. If two handoffs are executing concurrently using the same tool instance (which is possible with spawns), they will race on these fields. These should be passed through context or cloned per-invocation.

  4. LoopDetector session state unbounded growth: The sessions sync.Map grows without bound -- one entry per session, never cleaned up. Long-running gateways will accumulate stale sessions. Add a TTL-based eviction or hook into session cleanup.

Design

  1. Auth rotation belongs in a separate PR: pkg/providers/auth_rotation.go (185 lines + 343 lines of tests) is unrelated to multi-agent collaboration. Mixing it here makes the PR harder to review and bisect.

  2. Excessive external references in comments: Comments like 'inspired by NVIDIA CUDA stream scheduling', 'Google MapReduce uses similar fan-out caps', 'Microsoft Azure Functions uses similar timeout patterns' add noise without value. The patterns speak for themselves -- the comments should explain why the code does what it does, not draw parallels to unrelated systems.

  3. tools.DepthDenyList approach: Stripping tools at max depth is clever for preventing infinite chaining, but it changes the agent capability set mid-conversation. This could confuse the LLM if it planned to use a tool that was available in previous turns but is now gone. Consider returning a clear error from the tool instead of removing it entirely.

Testing

  1. Good test coverage overall (28 tests for blackboard, cycle detection, depth limits). The mockProvider approach is clean. Would like to see a test for the concurrent mutation issue in point 3.

Overall: solid foundation, but needs the thread-safety fix in point 3 and the session leak in point 4 before it is ready for merge. The auth rotation should be split out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Base Multi-agent Collaboration Framework & Shared Context

3 participants