Skip to content

Commit 09ff4d2

Browse files
authored
tmux: rewrite live detection on Claude's session registry (#43)
1 parent 8cbefd8 commit 09ff4d2

11 files changed

Lines changed: 687 additions & 668 deletions

File tree

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
last_verified: v2.1.150
3+
---
4+
5+
# Claude Live-Session Registry
6+
7+
`$CLAUDE_CONFIG_DIR/sessions/<pid>.json` (defaults to `~/.claude/sessions/`).
8+
9+
CCX reads this directory in [`internal/clauderegistry`](../../internal/clauderegistry/registry.go)
10+
to detect which Claude Code sessions are alive and whether each one is
11+
actively producing a turn. Only the subset of fields ccx actually needs
12+
is decoded in Go — this doc records the full schema for reference.
13+
14+
## What this is (and isn't)
15+
16+
The directory is a **live-process registry**, not durable session
17+
history. A top-level claude writes its PID file at startup and
18+
`updatePidFile`s in place as state changes. Durable transcripts live in
19+
`~/.claude/projects/<cwd-hash>/<sessionId>.jsonl` — a different
20+
mechanism, different lifecycle.
21+
22+
Subagents and agent-team teammates are launched with `--agent-id` and
23+
**skip** registration (`registerSession` returns early when
24+
`getAgentId()` is non-null). So the registry only ever contains
25+
top-level interactive processes; it is not a per-conversation manifest.
26+
27+
Cleanup: no background timer touches this directory. Stale files from
28+
crashed processes are only unlinked at the next claude startup by
29+
`countConcurrentSessions`. Between crashes and next-startup, dead PID
30+
files sit on disk — readers must liveness-check with `kill(pid, 0)`.
31+
WSL skips even the next-startup unlink, so stale entries accumulate
32+
there indefinitely.
33+
34+
## File schema
35+
36+
### Registration-time fields
37+
38+
Written by `registerSession` when the PID file is first created.
39+
40+
| Field | Type | Notes |
41+
| -------------- | -------- | -------------------------------------------------------------------------------------------------- |
42+
| `pid` | int | `process.pid` |
43+
| `sessionId` | string | Current session UUID. Rewritten in place on session-ID rotation. |
44+
| `cwd` | string | Process cwd at registration. Updated on chdir. |
45+
| `startedAt` | int | `Date.now()` in ms. |
46+
| `procStart` | string | `ps -o lstart= -p <pid>` output (UTC, `LC_ALL=C`). Stored for PID-reuse detection but unused on read. |
47+
| `version` | string | CLI version at registration. |
48+
| `peerProtocol` | int | Hard-coded `1` in v2.1.150. |
49+
| `kind` | string | `"interactive"` \| `"bg"` \| `"daemon"` \| `"daemon-worker"`. Defaults to `"interactive"` when `CLAUDE_CODE_SESSION_KIND` is unset. |
50+
| `entrypoint` | string | `"cli"` \| `"vscode"` \| ... — value of `CLAUDE_CODE_ENTRYPOINT`. |
51+
52+
Env-gated optional fields, also written at registration:
53+
54+
| Field | Condition |
55+
| --------- | -------------------------------------------------------------------- |
56+
| `name` | `CLAUDE_CODE_SESSION_NAME` is set |
57+
| `logPath` | `CLAUDE_CODE_SESSION_LOG` is set |
58+
| `agent` | `CLAUDE_CODE_AGENT` is set |
59+
| `jobId` | `kind === "bg"` and `CLAUDE_JOB_DIR` is set (stored as its basename) |
60+
61+
### Runtime-mutated fields
62+
63+
Added or rewritten after registration via `updatePidFile`. **Absent on
64+
files captured between registration and the first REPL state update —
65+
that is normal, not a missing-field bug.**
66+
67+
| Field | Trigger |
68+
| ----------------- | -------------------------------------------------------------------------------------------------- |
69+
| `sessionId` | Session-ID rotation. |
70+
| `cwd` | Working-directory change. |
71+
| `name` | `updateSessionName`. Also bumps `updatedAt`. |
72+
| `bridgeSessionId` | IDE / VSCode bridge attachment. |
73+
| `status` | REPL ribbon state. One of `"idle"`, `"busy"`, `"waiting"`, `"shell"`. Written on every transition. |
74+
| `waitingFor` | Reason string, set only when the raw REPL state is `"waiting"`. Omitted from JSON otherwise. |
75+
| `updatedAt` | `Date.now()` in ms, written alongside `status`/`waitingFor`/`name` updates. |
76+
77+
#### `status` values
78+
79+
| Value | Meaning |
80+
| ----------- | ------------------------------------------------------------------------------------------------ |
81+
| `"idle"` | REPL idle, no background work. |
82+
| `"busy"` | Actively processing a turn. |
83+
| `"waiting"` | Blocked on user input. See `waitingFor` for which kind. |
84+
| `"shell"` | REPL is idle but a local `Bash` tool call is still running in the background. |
85+
86+
`"shell"` is reported when the raw REPL state is `"idle"` but a Bash
87+
tool hasn't returned yet — the model isn't generating, but work is
88+
still happening. CCX treats only `"busy"` as "responding" for badge
89+
purposes; `"shell"` would otherwise pin a permanent badge on any
90+
session that left a long-running background task.
91+
92+
#### `waitingFor` values
93+
94+
Only set when `status == "waiting"`. Dropped from the JSON otherwise.
95+
96+
| Value | Trigger |
97+
| --------------------- | ------------------------------------------------------------------------------------------------ |
98+
| `"permission prompt"` | A tool-approval modal is mounted (`useIsPermissionPromptOpen()` true). |
99+
| `"worker request"` | A background worker raised a request to the main thread. |
100+
| `"sandbox request"` | Sandbox (Bash / computer-use) is asking for permission escalation. |
101+
| `"dialog open"` | A local-JSX slash command has mounted a dialog. |
102+
| `"input needed"` | Fallback for generic user-input wait when none of the above apply. |
103+
104+
### `name` is **not** the `/resume` title
105+
106+
Three independent display strings exist; conflating them produces
107+
confusing UI:
108+
109+
| Field | Storage | Writer | Purpose |
110+
| -------------- | -------------------------------- | ---------------------------------------------------------------------------- | ---------------------------------------------------------------- |
111+
| `name` | `sessions/<pid>.json` | `registerSession` (from env) + `updateSessionName` | Agent / spawn-seed display label for FleetView and `claude agents --json`. |
112+
| `custom-title` | session JSONL transcript | `setCustomTitle` — appends `{type:"custom-title",customTitle,sessionId}` | Human-set title in `/resume` (`/rename`). |
113+
| `ai-title` | session JSONL transcript | `setAiTitle` — appends `{type:"ai-title",aiTitle,sessionId}` | Auto-generated title in `/resume`. |
114+
115+
`/rename` does not modify `name`. Conversely, a session launched with
116+
`CLAUDE_CODE_SESSION_NAME` has a `name` that `/resume` does not
117+
display. CCX surfaces `name` (when present) as the agent label and
118+
leaves `/resume` titles alone.
119+
120+
## Reading the registry safely
121+
122+
The writer (`updatePidFile`) does **not** use a temp+rename pattern, so
123+
concurrent readers can land mid-write and see truncated JSON. The write
124+
window is microseconds.
125+
126+
`internal/clauderegistry` handles this with a 3-attempt retry on
127+
`json.Unmarshal` failure (5 ms sleep between attempts). A file that
128+
fails after three retries is skipped — the next refresh tick picks it
129+
up. This also covers the truncate-then-write race where the file size
130+
briefly drops to zero.
131+
132+
## Liveness probing
133+
134+
Use `kill(pid, 0)` (no signal sent — existence + permission check
135+
only). The Node implementation in `countConcurrentSessions` returns
136+
false for `pid <= 1`, so init/PID-0 entries are treated as dead. CCX
137+
matches that behavior.
138+
139+
`procStart` is written but not read by claude itself, so PID reuse
140+
across long-dead processes is theoretically possible. The analysis
141+
notes the surface is "if the OS reuses an orphaned PID for an unrelated
142+
process the registry will silently treat the impostor as the original
143+
session." Acceptable risk in practice; revisit only if CCX gets
144+
reports of phantom-live sessions.
145+
146+
## What `claude agents --json` does differently
147+
148+
The CLI's `agents --json` aggregates these files and:
149+
150+
- normalizes `status` to `idle` / `waiting` / `busy` (collapses `shell` into the same bucket as idle from the CLI's perspective);
151+
- **drops `waitingFor`** from its output.
152+
153+
That's why ccx reads the files directly: we keep the full status set
154+
and have access to `waitingFor` if we ever want to surface it.
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
// Package clauderegistry reads Claude Code's on-disk live-session registry
2+
// at $CLAUDE_CONFIG_DIR/sessions/<pid>.json — one file per top-level
3+
// claude process, written at startup and mutated in place as state
4+
// changes. CCX uses it to detect which sessions are alive and which are
5+
// actively producing a turn.
6+
//
7+
// Only the fields ccx actually consumes are decoded below. The full
8+
// schema, lifecycle, and quirks (status values, name vs custom-title,
9+
// PID reuse, WSL leak) are documented in
10+
// docs/claude-code/live-session-registry.md.
11+
//
12+
// Diagnostic logging: set CCX_DEBUG=1 to surface registry errors to
13+
// /tmp/ccx-debug.log (falls back to stderr if that path isn't writable).
14+
// Without it, errors are swallowed silently so a transient registry
15+
// glitch never crashes ccx — the trade-off is that users have no way to
16+
// see why "live" suddenly went empty.
17+
package clauderegistry
18+
19+
import (
20+
"encoding/json"
21+
"errors"
22+
"io"
23+
"io/fs"
24+
"log"
25+
"os"
26+
"path/filepath"
27+
"strings"
28+
"syscall"
29+
"time"
30+
)
31+
32+
// Field values we actually branch on.
33+
const (
34+
statusBusy = "busy" // actively processing a turn
35+
kindInteractive = "interactive" // normal user session
36+
)
37+
38+
// debugLog is wired in init below. Silent (io.Discard) unless CCX_DEBUG
39+
// is set, matching the convention in internal/tui/conversation.go.
40+
var debugLog *log.Logger
41+
42+
func init() {
43+
if os.Getenv("CCX_DEBUG") == "" {
44+
debugLog = log.New(io.Discard, "", 0)
45+
return
46+
}
47+
f, err := os.OpenFile("/tmp/ccx-debug.log", os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o644)
48+
if err != nil {
49+
debugLog = log.New(os.Stderr, "clauderegistry: ", log.Ltime|log.Lmicroseconds)
50+
return
51+
}
52+
debugLog = log.New(f, "clauderegistry: ", log.Ltime|log.Lmicroseconds)
53+
}
54+
55+
// LiveSession is the subset of a registry entry that ccx consumes. The
56+
// on-disk file has many more fields — see the docs.
57+
//
58+
// Status, in particular, may be absent on a file captured between
59+
// registration and the first REPL state update. Empty Status is treated
60+
// as "not responding".
61+
type LiveSession struct {
62+
PID int `json:"pid"`
63+
SessionID string `json:"sessionId"`
64+
CWD string `json:"cwd"`
65+
Status string `json:"status,omitempty"`
66+
Kind string `json:"kind,omitempty"`
67+
}
68+
69+
// IsBusy reports whether the model is actively generating right now —
70+
// upstream Claude's StatusBusy. This is the "responding" signal CCX
71+
// surfaces via session.Session.IsResponding.
72+
//
73+
// StatusShell (REPL idle, background Bash still running) and
74+
// StatusWaiting (blocked on user input) deliberately don't count: a
75+
// session that left a long-running tool in the background would
76+
// otherwise show a permanent responding badge.
77+
func (s LiveSession) IsBusy() bool {
78+
return s.Status == statusBusy
79+
}
80+
81+
// Dir returns the registry directory honoring $CLAUDE_CONFIG_DIR, falling
82+
// back to ~/.claude/sessions.
83+
func Dir() string {
84+
if d := os.Getenv("CLAUDE_CONFIG_DIR"); d != "" {
85+
return filepath.Join(d, "sessions")
86+
}
87+
home, err := os.UserHomeDir()
88+
if err != nil {
89+
return ""
90+
}
91+
return filepath.Join(home, ".claude", "sessions")
92+
}
93+
94+
// Read returns every live interactive session known to Claude Code.
95+
// Ghost entries (process gone) are filtered out. A missing directory
96+
// returns (nil, nil) — older Claude Code versions don't write this
97+
// directory and we treat that as "registry unavailable".
98+
func Read() ([]LiveSession, error) {
99+
dir := Dir()
100+
if dir == "" {
101+
return nil, nil
102+
}
103+
entries, err := os.ReadDir(dir)
104+
if err != nil {
105+
if errors.Is(err, fs.ErrNotExist) {
106+
return nil, nil
107+
}
108+
debugLog.Printf("ReadDir(%s): %v", dir, err)
109+
return nil, err
110+
}
111+
out := make([]LiveSession, 0, len(entries))
112+
for _, e := range entries {
113+
if e.IsDir() || !strings.HasSuffix(e.Name(), ".json") {
114+
continue
115+
}
116+
s, ok := readOne(filepath.Join(dir, e.Name()))
117+
if !ok {
118+
continue
119+
}
120+
// Kind defaults to "interactive" when unset. Skip bg/daemon
121+
// variants — those aren't user sessions.
122+
if s.Kind != "" && s.Kind != kindInteractive {
123+
continue
124+
}
125+
if !processAlive(s.PID) {
126+
continue
127+
}
128+
out = append(out, s)
129+
}
130+
return out, nil
131+
}
132+
133+
// readOne parses a single registry file. Claude Code writes these files
134+
// without an atomic rename, so a concurrent read can land mid-write and
135+
// see truncated JSON. Retry a few times — the write window is microseconds.
136+
//
137+
// A file that fails every retry is skipped, not propagated as an error:
138+
// a single broken entry shouldn't blank out the whole live list. The
139+
// failure is logged when CCX_DEBUG is on.
140+
func readOne(path string) (LiveSession, bool) {
141+
var lastErr error
142+
for range 3 {
143+
data, err := os.ReadFile(path)
144+
if err != nil {
145+
if errors.Is(err, fs.ErrNotExist) {
146+
return LiveSession{}, false
147+
}
148+
lastErr = err
149+
time.Sleep(5 * time.Millisecond)
150+
continue
151+
}
152+
var s LiveSession
153+
if err := json.Unmarshal(data, &s); err == nil && s.SessionID != "" {
154+
return s, true
155+
} else if err != nil {
156+
lastErr = err
157+
}
158+
time.Sleep(5 * time.Millisecond)
159+
}
160+
if lastErr != nil {
161+
debugLog.Printf("readOne(%s) gave up after 3 retries: %v", path, lastErr)
162+
}
163+
return LiveSession{}, false
164+
}
165+
166+
// processAlive returns true iff a process with this PID exists. kill(pid, 0)
167+
// sends no signal but performs the existence + permission check.
168+
func processAlive(pid int) bool {
169+
if pid <= 0 {
170+
return false
171+
}
172+
return syscall.Kill(pid, 0) == nil
173+
}
174+
175+
// Cwds returns absolute project paths of every live registry entry,
176+
// deduplicated and preserving the registry's enumeration order. Used by
177+
// callers that only need "which project paths have a claude running"
178+
// and don't care about pane attribution.
179+
func Cwds() []string {
180+
live, err := Read()
181+
if err != nil || len(live) == 0 {
182+
return nil
183+
}
184+
seen := make(map[string]bool, len(live))
185+
paths := make([]string, 0, len(live))
186+
for _, l := range live {
187+
abs, _ := filepath.Abs(l.CWD)
188+
if abs == "" {
189+
abs = l.CWD
190+
}
191+
if abs == "" || seen[abs] {
192+
continue
193+
}
194+
seen[abs] = true
195+
paths = append(paths, abs)
196+
}
197+
return paths
198+
}
199+
200+
// CwdSet is the set form of Cwds.
201+
func CwdSet() map[string]bool {
202+
paths := Cwds()
203+
out := make(map[string]bool, len(paths))
204+
for _, p := range paths {
205+
out[p] = true
206+
}
207+
return out
208+
}

0 commit comments

Comments
 (0)