fix(tui): try_lock shell manager in async UI refresh paths (#3804) by Hmbown · Pull Request #3808 · Hmbown/CodeWhale

Hmbown · 2026-06-29T22:53:13Z

Closes #3804 (child of #3800).

Problem

The async UI loop took a blocking std::sync::Mutex::lock() on the shell manager in two render-only refresh paths (refresh_active_task_panel ui.rs:1432, refresh_shell_exec_live_output ui.rs:1461). A contended lock could stall render/input — an amplifier of the fanout freeze.

Change

Both paths now use try_lock():

refresh_active_task_panel: on contention, retain the previous frame's background shell entries (last-known snapshot) so running shells don't flicker out.
refresh_shell_exec_live_output: on contention, skip this frame's live-output update (next refresh picks it up).

Guardrails (per the issue)

Shell ownership, cancellation, approval state, and output capture do not depend on these UI refreshes — they're render-only reads — so try_lock + snapshot/skip cannot weaken enforcement or hide a running shell beyond one frame.

Test

shell_live_output_refresh_does_not_block_on_contended_lock: holds the lock and asserts the refresh returns immediately (the old blocking lock() would deadlock on the same thread, so reaching the assert proves non-blocking). Build green; no regressions.

Acceptance

UI refresh paths no longer synchronously block on the shell std::sync::Mutex
On contention the UI keeps rendering and reuses the last snapshot / skips
Test covers shell-manager lock contention
Does not loosen shell approval/sandbox behavior

Note: this is the [safe-now] child #2 from the #3800 audit; it's isolated from the in-flight backpressure/parallel-dispatch branches.

The async UI loop took a blocking std::sync::Mutex lock on the shell manager in two render-only refresh paths (refresh_active_task_panel, refresh_shell_exec_live_output). If a background shell path held that lock, the UI thread could stall with no async yield — an amplifier of the #3800 fanout freeze. Use try_lock for these render-only reads: - refresh_active_task_panel: on contention, retain the previous frame's background shell entries (last-known snapshot) instead of dropping running shells from the Work panel. - refresh_shell_exec_live_output: on contention, skip this frame's live-output update (the next refresh picks it up). Shell ownership, cancellation, approval state, and output capture do not depend on these UI refreshes, so try_lock + snapshot/skip is safe and does not weaken enforcement (per the #3804 guardrails). Adds a contention regression test that holds the lock and asserts the live-output refresh returns immediately rather than deadlocking (the old blocking lock() would deadlock on the same thread). Parent: #3800 Fixes #3804

claude · 2026-06-29T22:53:31Z

Claude finished @Hmbown's task in 1m 54s —— View job

PR Review: fix(tui): try_lock shell manager in async UI refresh paths

Gather context and read changed files
Analyze diff for correctness, logic, tests, and reuse
Post review

Summary

The core idea is correct and well-motivated: blocking lock() in a render-only async path is a latency hazard, and try_lock + skip/snapshot is a sound pattern. The refresh_shell_exec_live_output change is clean. One real bug in the snapshot path needs a fix before merge.

Correctness Bug — `Background` kind collision in snapshot fallback

ui.rs:1438–1443 / ui.rs:1467

let prev_shell_entries: Vec<TaskPanelEntry> = app
    .task_panel
    .iter()
    .filter(|entry| matches!(entry.kind, TaskPanelEntryKind::Background))
    .cloned()
    .collect();

TaskPanelEntryKind::Background is not exclusive to shell jobs. active_rlm_task_entries (called at line 1430 in the same function) also emits kind: TaskPanelEntryKind::Background for RLM sessions (ui.rs:1621).

Sequence on contention:

entries is built: includes current-frame RLM entries (line 1430, Background kind).
prev_shell_entries filters the previous app.task_panel for Background — captures both old shell entries and old RLM entries.
Lock is contended → shell_entries = prev_shell_entries (the mixed bag).
entries.extend(shell_entries) — previous-frame RLM entries are appended on top of already-added current-frame RLM entries.

Result: every active RLM session appears twice in the task panel on any frame where shell-manager contention happens while RLM is running. It's a low-probability event, but the panel would show duplicate RLM: … rows.

Minimal fix — filter by a shell-specific signal. Since shell job ids are arbitrary while RLM ids follow "rlm-{n}" (ui.rs:1617), the safest narrow fix is:

.filter(|entry| {
    matches!(entry.kind, TaskPanelEntryKind::Background)
        && !entry.id.starts_with("rlm-")
})

A cleaner longer-term fix is a TaskPanelEntryKind::Shell variant so shell entries don't share a kind with other background work, but that's out of scope for this PR.

Fix this →

Minor: Test gap — `refresh_active_task_panel` has no contention test

The test shell_live_output_refresh_does_not_block_on_contended_lock correctly covers refresh_shell_exec_live_output. There is no equivalent test for the snapshot-retention path in refresh_active_task_panel. A complementary test that holds the shell-manager lock and asserts the panel keeps prior shell entries (and does not duplicate them) would close this gap and would have caught the Background kind collision above.

Good choices

list_jobs() takes &mut self (it calls shell.poll() and cleanup() internally), so mut mgr is correct.
The deadlock-as-proof technique in the test is clean: a blocking lock() would deadlock on the same thread, so reaching the assert! proves the path is non-blocking.
refresh_shell_exec_live_output skip-frame behavior is exactly right — missing one live-output poll is invisible to users.
The guardrail note in the PR description (shell ownership/cancellation/approval state do not depend on these paths) is accurate and the code supports it.

Verdict: One real bug (RLM duplication on contented lock in refresh_active_task_panel) needs fixing. The rest is solid. Happy to approve once that filter is tightened.

Hmbown added this to the v0.8.66 milestone Jun 29, 2026

Hmbown merged commit 2130139 into main Jun 30, 2026
19 checks passed

Hmbown deleted the codex/0866-3804-shell-lock-trylock branch June 30, 2026 01:15

Hmbown mentioned this pull request Jun 30, 2026

v0.8.66: Release gate for multi sub-agent fanout freeze #3800

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tui): try_lock shell manager in async UI refresh paths (#3804)#3808

fix(tui): try_lock shell manager in async UI refresh paths (#3804)#3808
Hmbown merged 1 commit into
mainfrom
codex/0866-3804-shell-lock-trylock

Hmbown commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hmbown commented Jun 29, 2026

Problem

Change

Guardrails (per the issue)

Test

Acceptance

Uh oh!

claude Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(tui): try_lock shell manager in async UI refresh paths

Summary

Correctness Bug — Background kind collision in snapshot fallback

Minor: Test gap — refresh_active_task_panel has no contention test

Good choices

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 29, 2026 •

edited

Loading

Correctness Bug — `Background` kind collision in snapshot fallback

Minor: Test gap — `refresh_active_task_panel` has no contention test