Skip to content

fix(tui): try_lock shell manager in async UI refresh paths (#3804)#3808

Merged
Hmbown merged 1 commit into
mainfrom
codex/0866-3804-shell-lock-trylock
Jun 30, 2026
Merged

fix(tui): try_lock shell manager in async UI refresh paths (#3804)#3808
Hmbown merged 1 commit into
mainfrom
codex/0866-3804-shell-lock-trylock

Conversation

@Hmbown

@Hmbown Hmbown commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Closes #3804 (child of #3800).

Problem

The async UI loop took a blocking std::sync::Mutex::lock() on the shell manager in two render-only refresh paths (refresh_active_task_panel ui.rs:1432, refresh_shell_exec_live_output ui.rs:1461). A contended lock could stall render/input — an amplifier of the fanout freeze.

Change

Both paths now use try_lock():

  • refresh_active_task_panel: on contention, retain the previous frame's background shell entries (last-known snapshot) so running shells don't flicker out.
  • refresh_shell_exec_live_output: on contention, skip this frame's live-output update (next refresh picks it up).

Guardrails (per the issue)

Shell ownership, cancellation, approval state, and output capture do not depend on these UI refreshes — they're render-only reads — so try_lock + snapshot/skip cannot weaken enforcement or hide a running shell beyond one frame.

Test

shell_live_output_refresh_does_not_block_on_contended_lock: holds the lock and asserts the refresh returns immediately (the old blocking lock() would deadlock on the same thread, so reaching the assert proves non-blocking). Build green; no regressions.

Acceptance

  • UI refresh paths no longer synchronously block on the shell std::sync::Mutex
  • On contention the UI keeps rendering and reuses the last snapshot / skips
  • Test covers shell-manager lock contention
  • Does not loosen shell approval/sandbox behavior

Note: this is the [safe-now] child #2 from the #3800 audit; it's isolated from the in-flight backpressure/parallel-dispatch branches.

The async UI loop took a blocking std::sync::Mutex lock on the shell manager
in two render-only refresh paths (refresh_active_task_panel,
refresh_shell_exec_live_output). If a background shell path held that lock, the
UI thread could stall with no async yield — an amplifier of the #3800 fanout
freeze.

Use try_lock for these render-only reads:
- refresh_active_task_panel: on contention, retain the previous frame's
  background shell entries (last-known snapshot) instead of dropping running
  shells from the Work panel.
- refresh_shell_exec_live_output: on contention, skip this frame's live-output
  update (the next refresh picks it up).

Shell ownership, cancellation, approval state, and output capture do not depend
on these UI refreshes, so try_lock + snapshot/skip is safe and does not weaken
enforcement (per the #3804 guardrails). Adds a contention regression test that
holds the lock and asserts the live-output refresh returns immediately rather
than deadlocking (the old blocking lock() would deadlock on the same thread).

Parent: #3800
Fixes #3804
@Hmbown Hmbown added this to the v0.8.66 milestone Jun 29, 2026
@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 1m 54s —— View job


PR Review: fix(tui): try_lock shell manager in async UI refresh paths

  • Gather context and read changed files
  • Analyze diff for correctness, logic, tests, and reuse
  • Post review

Summary

The core idea is correct and well-motivated: blocking lock() in a render-only async path is a latency hazard, and try_lock + skip/snapshot is a sound pattern. The refresh_shell_exec_live_output change is clean. One real bug in the snapshot path needs a fix before merge.


Correctness Bug — Background kind collision in snapshot fallback

ui.rs:1438–1443 / ui.rs:1467

let prev_shell_entries: Vec<TaskPanelEntry> = app
    .task_panel
    .iter()
    .filter(|entry| matches!(entry.kind, TaskPanelEntryKind::Background))
    .cloned()
    .collect();

TaskPanelEntryKind::Background is not exclusive to shell jobs. active_rlm_task_entries (called at line 1430 in the same function) also emits kind: TaskPanelEntryKind::Background for RLM sessions (ui.rs:1621).

Sequence on contention:

  1. entries is built: includes current-frame RLM entries (line 1430, Background kind).
  2. prev_shell_entries filters the previous app.task_panel for Background — captures both old shell entries and old RLM entries.
  3. Lock is contended → shell_entries = prev_shell_entries (the mixed bag).
  4. entries.extend(shell_entries) — previous-frame RLM entries are appended on top of already-added current-frame RLM entries.

Result: every active RLM session appears twice in the task panel on any frame where shell-manager contention happens while RLM is running. It's a low-probability event, but the panel would show duplicate RLM: … rows.

Minimal fix — filter by a shell-specific signal. Since shell job ids are arbitrary while RLM ids follow "rlm-{n}" (ui.rs:1617), the safest narrow fix is:

.filter(|entry| {
    matches!(entry.kind, TaskPanelEntryKind::Background)
        && !entry.id.starts_with("rlm-")
})

A cleaner longer-term fix is a TaskPanelEntryKind::Shell variant so shell entries don't share a kind with other background work, but that's out of scope for this PR.

Fix this →


Minor: Test gap — refresh_active_task_panel has no contention test

The test shell_live_output_refresh_does_not_block_on_contended_lock correctly covers refresh_shell_exec_live_output. There is no equivalent test for the snapshot-retention path in refresh_active_task_panel. A complementary test that holds the shell-manager lock and asserts the panel keeps prior shell entries (and does not duplicate them) would close this gap and would have caught the Background kind collision above.


Good choices

  • list_jobs() takes &mut self (it calls shell.poll() and cleanup() internally), so mut mgr is correct.
  • The deadlock-as-proof technique in the test is clean: a blocking lock() would deadlock on the same thread, so reaching the assert! proves the path is non-blocking.
  • refresh_shell_exec_live_output skip-frame behavior is exactly right — missing one live-output poll is invisible to users.
  • The guardrail note in the PR description (shell ownership/cancellation/approval state do not depend on these paths) is accurate and the code supports it.

Verdict: One real bug (RLM duplication on contented lock in refresh_active_task_panel) needs fixing. The rest is solid. Happy to approve once that filter is tightened.

@Hmbown Hmbown merged commit 2130139 into main Jun 30, 2026
19 checks passed
@Hmbown Hmbown deleted the codex/0866-3804-shell-lock-trylock branch June 30, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.8.66: Remove blocking shell-manager locks from async TUI refresh paths

1 participant