-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v0.8.66: Remove engine/TUI channel backpressure from sub-agent status storms #3802
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't workingrelease-blockerMust be fixed before the next releaseMust be fixed before the next releasereliabilityReliability, flaky behavior, retries, fallbacks, and robustnessReliability, flaky behavior, retries, fallbacks, and robustnesssubagentsSub-agent orchestration, lifecycle, and completion handlingSub-agent orchestration, lifecycle, and completion handlingtuiTerminal UI behavior, rendering, or interactionTerminal UI behavior, rendering, or interactionv0.8.66Targeting v0.8.66Targeting v0.8.66
Milestone
Description
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingrelease-blockerMust be fixed before the next releaseMust be fixed before the next releasereliabilityReliability, flaky behavior, retries, fallbacks, and robustnessReliability, flaky behavior, retries, fallbacks, and robustnesssubagentsSub-agent orchestration, lifecycle, and completion handlingSub-agent orchestration, lifecycle, and completion handlingtuiTerminal UI behavior, rendering, or interactionTerminal UI behavior, rendering, or interactionv0.8.66Targeting v0.8.66Targeting v0.8.66
Projects
StatusShow more project fields
Done
Problem
High sub-agent fanout can put pressure on the bounded engine event channel and bounded engine op channel. Because many paths use
.send().await, the engine or TUI event loop can stall when the receiver is not draining fast enough.Parent: #3800
Verified evidence
crates/tui/src/core/engine.rs:tx_op/rx_opismpsc::channel(32)andtx_event/rx_eventismpsc::channel(256).EngineHandle::sendawaitsself.tx_op.send(op).crates/tui/src/tui/ui.rs: after an event drain batch,Op::ListSubAgentsis sent viaengine_handle.send(...).await.tx_event.send(...).await.try_send, which avoids blocking but can drop the UI completion event under event-channel pressure.Critical analysis
The
try_send(Event::AgentComplete)drop does not necessarily lose parent-turn completion, because parent completion signaling has another path. The bug is still serious: the UI/status stream can fail to converge under pressure, and the engine/TUI can backpressure each other through awaited sends.Desired behavior
Sub-agent status storms should be coalesced or degraded without blocking input polling or parent-turn progress. Critical correctness events must be durable or recoverable; noncritical status refreshes can be lossy if the next state snapshot repairs them.
Suggested implementation options
ListSubAgents.Acceptance criteria
ListSubAgentsrefresh requests cannot block the TUI event loop whentx_opis full.tx_event.send().await.Security / policy guardrails
Backpressure fixes must classify events by criticality:
try_sendor nonblocking send is used, dropped events need bounded diagnostics so support can distinguish intentional coalescing from data loss.