Skip to content

fix(wta): log process teardown + agent-pane disconnect; trim log spam#298

Open
vanzue wants to merge 10 commits into
mainfrom
worktree-wta-disconnect-logging
Open

fix(wta): log process teardown + agent-pane disconnect; trim log spam#298
vanzue wants to merge 10 commits into
mainfrom
worktree-wta-disconnect-logging

Conversation

@vanzue

@vanzue vanzue commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Why

A reported "helper just stopped responding" incident turned out to be undiagnosable from the logs — not because the success path was under-logged (it was logged exhaustively), but because the teardown path was silent:

  • On a non-graceful death the process vanished mid-stream and the non-blocking appender's buffered tail was lost (shutdown_flush only runs on clean exits; the WorkerGuard lives in a static that never Drops).
  • The C++ pane side stayed completely silent when its backend died — the only thing logged after the helper disappeared was _FocusAgentPane called.
  • Per-token-chunk / per-VT-sequence debug! lines flooded the logs (the master log was 631 KB of delivered=true), crowding out signal and pushing useful records out of the buffer faster.

What

1. Rust — record why/when a process is torn down, and flush before dying

  • logging::install_ctrl_handler() — a SetConsoleCtrlHandler that logs which control event (CTRL_CLOSE/CTRL_LOGOFF/CTRL_SHUTDOWN/CTRL_BREAK) tore the process down and calls shutdown_flush() so the final records (incl. the existing transport-lost WARN) reach disk. Returns FALSE → default handler still runs, termination behavior unchanged.
  • Wired into main() right after logging::init, so it covers both helper and master.
  • Honest limitation documented inline: a hard taskkill /F / Task Manager "End task" delivers no control event and stays untraceable from inside the process.

2. C++ — pane no longer blind to backend death

  • Tab::_UpdateConnectionClosedState() now _agentPaneLog(...)s when the agent pane's connection closes — correlating the UI-side disconnect to the millisecond with the Rust logs, and firing even when the master is gone and can't emit restart_agent_pane.

3. Success-path cleanup (debugtrace)

  • Demoted the per-chunk / per-VT-sequence floods (master forwarding ×2, acp session_notification, autofix WtEvent/classified/cross-tab-drop). Kept one per-event breadcrumb at debug and all connection/lifecycle logs intact.

Verification

  • cargo +stable check passes; no warnings from the changed lines.
  • ✅ Full Terminal MSBuild (Debug) + deploy, verified live by closing an agent pane with double-Ctrl+C:
    • Helper wta-main_helper-*.log: WARN lifecycle: console control event received … ctrl_type=2 event="CTRL_CLOSE" — and it is the last line in the file, proving the pre-death shutdown_flush() landed the buffered tail (the original blind spot).
    • C++ terminal-agent-pane.log: releasing wta-master pid=… (deliberate teardown via KILL_ON_JOB_CLOSE); no _UpdateConnectionClosedState spam.

🤖 Generated with Claude Code

Helper/master deaths were undiagnosable: on a non-graceful teardown the
process vanished mid-stream and the non-blocking appender's buffered tail
was lost (shutdown_flush never ran), while the C++ pane side stayed
silent. The success path was logged exhaustively; the teardown path was
not.

- logging: install a SetConsoleCtrlHandler that logs which control event
  (CTRL_CLOSE/LOGOFF/SHUTDOWN/BREAK) tore the process down and drains the
  appender so final records (incl. the transport-lost WARN) reach disk.
  Returns FALSE -- termination behavior unchanged. Wired into main() so it
  covers both helper and master. (A hard taskkill /F sends no control
  event and stays untraceable by design.)
- Tab.cpp: _agentPaneLog when the agent pane's connection closes, so a
  dead helper is recorded on the C++ side and correlates to the
  millisecond with the Rust logs -- fires even when the master is gone and
  cannot emit restart_agent_pane.
- trim per-token-chunk / per-VT-sequence logs (master forwarding x2, acp
  session_notification, autofix WtEvent/classified/cross-tab-drop) from
  debug to trace; keep one per-event breadcrumb at debug and all
  connection/lifecycle logs intact.

cargo +stable check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 03:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves diagnosability of Windows Terminal Agent (WTA) failures by ensuring teardown paths emit logs and by reducing high-volume success-path debug logging that can drown out lifecycle signals. It also adds a C++-side log record when the agent pane’s backend disconnects to correlate UI-side behavior with Rust-side logs.

Changes:

  • Add a Windows console control handler in WTA to log teardown signals and flush buffered logs before termination.
  • Log agent-pane backend disconnects from the TerminalApp Tab to make UI-side disconnects visible.
  • Reduce log volume by demoting per-chunk/per-sequence debug logs to trace across ACP and autofix paths.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/wta/src/protocol/acp/client.rs Demotes per-streamed-chunk ACP session_notification logging to trace to reduce debug noise.
tools/wta/src/master/mod.rs Demotes per-chunk session notification routing logs to trace to reduce master log volume.
tools/wta/src/main.rs Installs the new teardown control handler immediately after logging init.
tools/wta/src/logging.rs Adds install_ctrl_handler() to log/flush on console teardown control events.
tools/wta/src/app.rs Demotes high-volume autofix event/classification logs to trace.
tools/wta/Cargo.toml Enables windows-sys console APIs via Win32_System_Console.
src/cascadia/TerminalApp/Tab.cpp Logs when the agent pane’s connection is closed to surface backend death on the C++ side.

Comment thread tools/wta/src/logging.rs Outdated
Comment thread src/cascadia/TerminalApp/Tab.cpp Outdated
Comment thread tools/wta/src/logging.rs Fixed
Comment thread tools/wta/src/logging.rs Fixed
@github-actions

This comment has been minimized.

…ail, spell

- Tab.cpp: edge-trigger the agent-pane disconnect log on the open→closed
  transition (new `_agentPaneConnectionClosed` guard on Tab). The method
  runs on every pane connection-state change and on focus changes, so the
  unconditional log would spam terminal-agent-pane.log after a helper dies.
- logging.rs: SetConsoleCtrlHandler install failure now logs at warn, not
  debug, so release (info) logs explain why teardown signals are missing
  rather than leaving it silent — the whole point is diagnosability.
- spelling: allow the Win32 token PHANDLER (PHANDLER_ROUTINE) in apis.txt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tools/wta/src/logging.rs Fixed
@github-actions

This comment has been minimized.

The noun "diagnosability" isn't in cspell's base dictionary (though
"diagnosable" is); reword the one occurrence to "diagnostic" to keep the
spell-check gate green without a dictionary entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 09:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread tools/wta/src/logging.rs
vanzue and others added 2 commits June 16, 2026 19:42
Per PR review: SetConsoleCtrlHandler failure now logs GetLastError() in an
`error_code` field, captured immediately before any other call can reset
thread-last-error, so the warning distinguishes likely causes (no console
attached vs. permission failure) instead of just stating it failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These five info! sites fired per-event, not per-lifecycle, and dominated
the info logs (~2284 lines in a sample run). Info should be once-per-
lifecycle / rare connection events; per-event detail belongs at debug
(still available via WTA_LOG=debug). No information removed.

- agent_sessions.rs  "alive snapshot upgraded Historical row → Live" (1724x)
- master/mod.rs       "received helper session hook" (157x)
- master/mod.rs       "handling intellterm.wta/session_hook locally" (157x)
- app.rs (agent_route) "applied" (127x) and "routing" (119x)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 14:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comment thread tools/wta/src/logging.rs Outdated
vanzue and others added 2 commits June 16, 2026 23:08
…es to debug

Refines the earlier wholesale demotion of "received helper session hook":
terminal/lifecycle transitions (SessionStarted/SessionStopped/PaneClosed/
ConnectionFailed) stay at info so field diagnosis of session-state bugs
works from the default release log, while the high-frequency routine events
(ToolStarting/ToolCompleted/Notification/resume bookkeeping) go to debug.
The generic master-side dispatch breadcrumb stays at debug (it carries no
event detail).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per PR review: only the helper is a ConPTY child of Windows Terminal and
receives CTRL_CLOSE on pane/tab/window close. The master is spawned
CREATE_NO_WINDOW and contained in a Job Object with KILL_ON_JOB_CLOSE
(C++ SharedWta), so its routine teardown is a job reap — like
TerminateProcess, no control event — and this handler does not make it
traceable. Tighten the doc comment and the main.rs call-site comment to
say so instead of implying both processes always observe these events.
No behavior change (comment-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment thread tools/wta/src/master/mod.rs
Comment thread tools/wta/src/logging.rs Outdated
Comment thread src/cascadia/TerminalApp/Tab.cpp
vanzue and others added 2 commits June 17, 2026 00:08
Fills the two diagnosability gaps left after the console-ctrl-handler:

1. Rust panic hook (logging.rs): panics wrote only to stderr — invisible
   for a ConPTY helper / CREATE_NO_WINDOW master — and the appender's
   buffered tail was lost on a fatal panic. install_panic_hook() logs the
   panic (message + location + thread) AND synchronously appends to
   wta-panic.log (independent of the async appender, so it survives a fatal
   panic), then chains the previous hook. Does NOT call shutdown_flush (that
   would kill logging after a recoverable catch_unwind).

2. C++ master-death observability (SharedWta.cpp): the master is job-killed
   (KILL_ON_JOB_CLOSE) and can't log its own hard death. Log from the parent
   that observes it:
   - _OnProcessExited: the wait callback fires for unexpected master exit
     (crash/OOM/taskkill /F) — the external observer that makes silent
     master deaths diagnosable. Deliberate teardowns never reach it.
   - _CleanupLocked: record the deliberate KILL_ON_JOB_CLOSE teardown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- logging.rs: treat SetConsoleCtrlHandler's ERROR_INVALID_HANDLE (no console,
  expected for the CREATE_NO_WINDOW master / detached CLI) as debug; keep warn
  only for unexpected failures, so the warn can't reintroduce startup spam.
- master/mod.rs: match the session-hook level decision on &event (borrow, not
  consume) — clearer intent; event is reused by the reducer below.
- Tab.cpp: only walk the tree for the agent pane when something is closed,
  avoiding a second traversal on the common all-healthy path (still re-arms
  the edge-trigger guard correctly when nothing is closed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 23:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings June 18, 2026 01:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comment on lines 79 to 99
# Workaround for the archival of check-spelling/check-spelling (archived 2026-06-16).
# On archival the maintainer flipped the action's runtime security poll (secpoll)
# to return the advisory 'Sorry' for v0.0.26 -- the final release -- which fatally
# to return a security advisory for v0.0.26 -- the final release -- which fatally
# cancels the workflow (exit 200) on every repo pinned to it. There is no fixed
# version to upgrade to. The action exposes an `ignore_security_advisory` input,
# but v0.0.26 never maps it into the secpoll step's environment, so passing it via
# `with:` is silently dropped. secpoll reads the INPUT_IGNORE_SECURITY_ADVISORY env
# var directly, so we set it here (composite action steps inherit workflow-level
# env). The value must exactly match the advisory text 'Sorry'. This keeps the
# (still-executable) archived action running. Caveat: if check-spelling.dev is ever
# taken down, secpoll's lookup returns empty/NXDOMAIN and this var being set will
# instead make secpoll fail ("...but there is no security advisory") -- at which
# env). secpoll requires this value to EXACTLY equal the current advisory text
# (the DNS TXT record at 26.0.0.security-status.secpoll.check-spelling.dev, minus
# its leading "3 " status prefix); any mismatch re-triggers the fatal exit 200 with
# "(ignore_security_advisory did not match)". The maintainer has already changed the
# text once (from "Sorry" to the value below), so if spelling starts failing fast
# (~6s) again, re-read that TXT record and update this string to match. This keeps
# the (still-executable) archived action running. Caveat: if check-spelling.dev is
# ever taken down, secpoll's lookup returns empty/NXDOMAIN and this var being set
# will instead make secpoll fail ("...but there is no security advisory") -- at which
# point this workflow must be reworked (remove the checker or pin a maintained fork).
env:
INPUT_IGNORE_SECURITY_ADVISORY: Sorry
INPUT_IGNORE_SECURITY_ADVISORY: "Sorry. https://github.com/jsoref/2026-06-16-credential-leak/blob/main/README.md"

@vanzue vanzue force-pushed the worktree-wta-disconnect-logging branch from f80858e to 92df298 Compare June 18, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants