Description
Interactive gjc sessions do not tear down the subprocesses their tools spawn when the session ends. Specifically, the browser tool's headless Chrome (and, on signal kills, the Python eval kernel) is left running and reparented to init (PID 1). Over a long-lived workstation this accumulates orphaned Chrome/Python processes that survive forever and drag system load.
Three independent gaps combine:
- The browser tool closes Chrome only on an explicit tab/browser
close action. There is no idle teardown and no session-end teardown.
AgentSession.dispose() has no browser teardown at all. It disposes async jobs, MCP, LSP, the Python kernel, and provider sessions — but never releases browser handles. So even a graceful quit leaks any open headless Chrome.
- The interactive entry installs no
SIGTERM/SIGINT/exit handler. An external kill, a cleanup sweep, OOM, or terminal close kills the agent instantly and bypasses dispose() entirely, orphaning Chrome and the Python kernel.
Observed in the wild: a 26h session (bun .../gjc, RSS 1.28 GB, wchan=ep_poll) holding a 22h-old headless Chrome, an 18h-old Python runner, 2 unreaped node zombies, plus 30 leaked puppeteer_dev_chrome_profile-* dirs (361 MB) in /tmp. kill -TERM <pid> killed the agent instantly and left Chrome running as a ppid=1 orphan.
Steps to Reproduce
- Start an interactive gjc session; note its bun PID
G.
- Use the
browser tool to open a headless tab — do not close it.
ps --ppid G → confirm a headless Chrome child of G.
- End the session two ways:
- Repeat 4b with an
eval Python cell open to see the runner orphaned too.
Zombie sub-symptom (mechanism confirmed under bun 1.3.14): bun reaps children on the event loop, so a wedged/loop-blocked session stops reaping and exited children pile up as Z:
// bun /tmp/zblock.js (watch from another shell)
const { spawn } = require("node:child_process");
const c = spawn("node", ["-e","setTimeout(()=>process.exit(0),300)"], { stdio:"ignore" });
require("fs").writeFileSync("/tmp/zblock-pid.txt", JSON.stringify({ bun: process.pid, child: c.pid }));
const t = Date.now(); while (Date.now()-t < 4500) Math.sqrt(Math.random()); // block the loop
process.exit(0);
At t≈2s the child shows STAT=Z (parent = bun); after the loop unblocks it is reaped. Negative controls (Bun.spawn/child_process with no-await / unref / detached / dropped-ref+gc) all reap cleanly with no zombie. So the zombies are a symptom of a wedged session, not a spawn-pattern bug.
Expected Behavior
When a session ends — gracefully or via signal — every subprocess the session spawned (headless/spawned Chrome from the browser tool, the Python eval kernel) should be torn down, not reparented to init. Killing or closing a session should not leave Chrome/Python running.
Root cause (code, v0.5.1)
Paths relative to packages/coding-agent/src.
- Browser closed only on explicit release.
tools/browser/registry.ts:165-194 disposeBrowserHandle() → headless does await handle.browser.close() (:169). Callers: registry.ts:53 (only when replacing a different-kind browser), tools/browser/tab-supervisor.ts:243,379 (explicit releaseTab at refCount 0), tab-supervisor.ts:133,144 (worker-init failure). No closeAllBrowsers export and no process-exit hook in registry.ts.
AgentSession.dispose() omits browsers. session/agent-session.ts (≈3163-3207): tears down post-prompt tasks, async jobs, MCPManager.disconnectAll() (:3182), shutdownAllLspClients() (:3187), Python kernel (:3188, :3194 disposeKernelSessionsByOwner), power assertion, sessionManager.close(), provider sessions, hindsight — no releaseBrowser/disposeBrowserHandle/registry call anywhere.
- No interactive signal handler.
process.on/once('SIGTERM'|'SIGINT'|'SIGHUP'|'exit'|'beforeExit') exists only in non-interactive subcommands: cli/auth-broker-cli.ts:146-147, cli/auth-gateway-cli.ts:214-215, cli/shell-cli.ts:97 (SIGINT only), cli/stats-cli.ts:158 (SIGINT only), modes/rpc/rpc-mode.ts:672-673. The interactive session that runs browser/eval has none. session.dispose() is only reached on the normal quit path (main.ts:1035, then postmortem.quit(0)).
- Python kernel is covered on graceful dispose (
disposeKernelSessionsByOwner, agent-session.ts:3194) but not on signal (no handler) → orphaned on external kill, matching the observed 18h runner.
Error Output
# stale session 798818: bun gjc, 26h, RSS 1.28GB, wchan=ep_poll
$ ps --ppid 798818 -o pid,stat,etime,args
PID STAT ELAPSED COMMAND
1989217 Ssl 22:39:24 /opt/google/chrome/chrome --headless ... --user-data-dir=/tmp/puppeteer_dev_chrome_profile-m7dOHm
3308437 Sl+ 18:35:41 python3 /tmp/gjc-python-runner/runner-*.py
99545 Zs - [node] <defunct>
3203093 Zs - [node] <defunct>
$ kill -TERM 798818 # agent dies instantly
$ ps -o pid,ppid,etime,args -p 1989217
PID PPID ELAPSED COMMAND
1989217 1 22:39:47 /opt/google/chrome/chrome --headless ... # reparented to init, still running
Additional context
- Suggested fix direction (not prescriptive): add browser-registry teardown to
AgentSession.dispose() (release/close all session-owned handles, disposeBrowserHandle(..., { kill: true })); install a top-level, time-boxed SIGTERM/SIGINT/exit handler in the interactive entry that runs the same dispose. Consider launching Chrome in its own process group and killTree on teardown — the helper already exists at autoresearch/helpers.ts:104. Mind browser refCount/holdBrowser ownership when subagent sessions share a browser, and the existing RPC-mode signal handling + postmortem.quit path.
- Open question (unproven): what wedged session 798818 to RSS 1.28 GB /
ep_poll for 26h (slow leak vs stuck subsystem vs legitimate growth)? Needs a heap snapshot / long-session repro; not asserting a leak.
- Happy to send a focused PR (start with browser teardown in
dispose(), then the interactive signal handler) targeting the dev branch once you confirm the teardown ownership model.
Environment
- Platform: Linux (Ubuntu 22.04, x64)
- gjc version: 0.5.1 (= current npm
latest)
- Bun version: 1.3.14
- Provider: not provider-specific
Description
Interactive
gjcsessions do not tear down the subprocesses their tools spawn when the session ends. Specifically, thebrowsertool's headless Chrome (and, on signal kills, the Python eval kernel) is left running and reparented toinit(PID 1). Over a long-lived workstation this accumulates orphaned Chrome/Python processes that survive forever and drag system load.Three independent gaps combine:
closeaction. There is no idle teardown and no session-end teardown.AgentSession.dispose()has no browser teardown at all. It disposes async jobs, MCP, LSP, the Python kernel, and provider sessions — but never releases browser handles. So even a graceful quit leaks any open headless Chrome.SIGTERM/SIGINT/exithandler. An externalkill, a cleanup sweep, OOM, or terminal close kills the agent instantly and bypassesdispose()entirely, orphaning Chrome and the Python kernel.Observed in the wild: a 26h session (
bun .../gjc, RSS 1.28 GB,wchan=ep_poll) holding a 22h-old headless Chrome, an 18h-old Python runner, 2 unreapednodezombies, plus 30 leakedpuppeteer_dev_chrome_profile-*dirs (361 MB) in/tmp.kill -TERM <pid>killed the agent instantly and left Chrome running as appid=1orphan.Steps to Reproduce
G.browsertool toopena headless tab — do notcloseit.ps --ppid G→ confirm a headless Chrome child ofG./quit): Chrome stays alive and is reparented toppid=1. → proves gap Remove bundled telemetry reporting surfaces #2 (dispose leaks browsers).kill -TERM G): the agent dies instantly; Chrome (and an open Python eval kernel) are reparented toppid=1and keep running. → proves gap GJC rebranding plan: red-claw UI redesign #3 (no signal teardown).evalPython cell open to see the runner orphaned too.Zombie sub-symptom (mechanism confirmed under bun 1.3.14): bun reaps children on the event loop, so a wedged/loop-blocked session stops reaping and exited children pile up as
Z:At t≈2s the child shows
STAT=Z(parent = bun); after the loop unblocks it is reaped. Negative controls (Bun.spawn/child_processwith no-await / unref / detached / dropped-ref+gc) all reap cleanly with no zombie. So the zombies are a symptom of a wedged session, not a spawn-pattern bug.Expected Behavior
When a session ends — gracefully or via signal — every subprocess the session spawned (headless/spawned Chrome from the
browsertool, the Python eval kernel) should be torn down, not reparented toinit. Killing or closing a session should not leave Chrome/Python running.Root cause (code, v0.5.1)
Paths relative to
packages/coding-agent/src.tools/browser/registry.ts:165-194disposeBrowserHandle()→ headless doesawait handle.browser.close()(:169). Callers:registry.ts:53(only when replacing a different-kind browser),tools/browser/tab-supervisor.ts:243,379(explicitreleaseTabat refCount 0),tab-supervisor.ts:133,144(worker-init failure). NocloseAllBrowsersexport and no process-exit hook inregistry.ts.AgentSession.dispose()omits browsers.session/agent-session.ts(≈3163-3207): tears down post-prompt tasks, async jobs,MCPManager.disconnectAll()(:3182),shutdownAllLspClients()(:3187), Python kernel (:3188,:3194disposeKernelSessionsByOwner), power assertion,sessionManager.close(), provider sessions, hindsight — noreleaseBrowser/disposeBrowserHandle/registry call anywhere.process.on/once('SIGTERM'|'SIGINT'|'SIGHUP'|'exit'|'beforeExit')exists only in non-interactive subcommands:cli/auth-broker-cli.ts:146-147,cli/auth-gateway-cli.ts:214-215,cli/shell-cli.ts:97(SIGINT only),cli/stats-cli.ts:158(SIGINT only),modes/rpc/rpc-mode.ts:672-673. The interactive session that runsbrowser/evalhas none.session.dispose()is only reached on the normal quit path (main.ts:1035, thenpostmortem.quit(0)).disposeKernelSessionsByOwner, agent-session.ts:3194) but not on signal (no handler) → orphaned on external kill, matching the observed 18h runner.Error Output
Additional context
AgentSession.dispose()(release/close all session-owned handles,disposeBrowserHandle(..., { kill: true })); install a top-level, time-boxedSIGTERM/SIGINT/exithandler in the interactive entry that runs the same dispose. Consider launching Chrome in its own process group andkillTreeon teardown — the helper already exists atautoresearch/helpers.ts:104. Mind browserrefCount/holdBrowserownership when subagent sessions share a browser, and the existing RPC-mode signal handling +postmortem.quitpath.ep_pollfor 26h (slow leak vs stuck subsystem vs legitimate growth)? Needs a heap snapshot / long-session repro; not asserting a leak.dispose(), then the interactive signal handler) targeting thedevbranch once you confirm the teardown ownership model.Environment
latest)