Skip to content

fix(pool): prune stale worktree registrations before add in Acquire#31

Open
e-jung wants to merge 2 commits into
kunchenguid:mainfrom
e-jung:fix/get-prunes-stale-worktree-registrations
Open

fix(pool): prune stale worktree registrations before add in Acquire#31
e-jung wants to merge 2 commits into
kunchenguid:mainfrom
e-jung:fix/get-prunes-stale-worktree-registrations

Conversation

@e-jung

@e-jung e-jung commented Jun 20, 2026

Copy link
Copy Markdown

Intent

treehouse get fails hard when git has a stale/prunable worktree registration at the pool path it tries to create the new worktree in. It does not run git worktree prune, does not pass --force, and does not catch the "already registered" error to recover. A manual git worktree prune is required before get works again.

This blocks every spawn until a human prunes by hand. Reported in the wild on v1.4.0 (the fleet hit it); reproduced deterministically on current main (v1.7.0, 7e3fd54). See #30 for the issue with the full repro.

#28 (v1.7.0) hardened the prune subcommand to classify orphaned worktrees. The get acquisition path was untouched and still has no defense against a stale registration, so the bug is live on main.

Refs #30.

Approach

Run git worktree prune immediately before git worktree add in pool.Acquire's create-new-worktree branch. git worktree prune is safe by design — it removes only bookkeeping entries whose target directories are already gone, so it cannot destroy live worktrees or data. Running it unconditionally on every acquire is therefore correct and cannot cause data loss.

Rejected alternative: catching the specific "already registered"/"prunable" error and retrying after prune. That requires string-matching git's error output across versions, which is brittle. The unconditional prune is simpler and idempotent.

The prune is placed only in the create-new-worktree branch (not in the reusable-worktree loop) because healState already filters missing-dir entries from treehouse's own state file, so the create-new path is the sole exposure.

Scope

Three files, +72 / -0 (functional):

  • internal/git/git.go — new PruneWorktrees(repoRoot) error wrapper next to AddWorktree, documented as safe-by-design.
  • internal/pool/pool.goAcquire calls git.PruneWorktrees(repoRoot) before git.AddWorktree(...) in the create-new-worktree branch; fails closed with fmt.Errorf("failed to prune stale worktrees: %w", err).
  • cmd/e2e_test.goTestGetRecoversFromStaleWorktreeRegistration: plants a prunable worktree at the slot path get will target, asserts git lists it as prunable, then asserts treehouse get succeeds and recreates the worktree directory.

AGENTS.md and README.md updated to document the self-healing behavior (added by the no-mistakes document step).

Risk

Low. The change adds one safe, idempotent git worktree prune call before the only git worktree add site in Acquire. Prune cannot delete live worktrees or data — git only removes registrations for directories that are already missing. The lock placement is unchanged (the prune runs inside the existing WithStateLock critical section, same as the surrounding AddWorktree). Error wrapping matches the existing AddWorktree error handling.

Verification

  • New e2e test TestGetRecoversFromStaleWorktreeRegistration passes.
  • Full suite green: go test -race ./... (race detector clean across cmd, internal/config, internal/git, internal/hooks, internal/pool, internal/process, internal/updater).
  • gofmt -l clean; go vet ./... clean.
  • Deterministic before/after reproduction in throwaway /tmp git repos (no contact with any real pool):
    • Base (unfixed v1.7.0): planted prunable worktree registration → treehouse get → exit 1, failed to create worktree: ... fatal: '...' is a missing but already registered worktree.
    • Fixed: same prunable state → treehouse get → exit 0, 🌳 Entered worktree at ... — worktree directory recreated.
Reproduction log (base vs fixed)
########## BASE (unfixed v1.7.0) ##########
[BASE-v1.7.0] pool dir: /tmp/tmp.V9w2KhBOUr/home/.treehouse/myrepo-0b523e
[BASE-v1.7.0] git worktree list --porcelain BEFORE get (filtered):
    worktree /tmp/tmp.V9w2KhBOUr/myrepo
    worktree /tmp/tmp.V9w2KhBOUr/home/.treehouse/myrepo-0b523e/1/myrepo
    prunable gitdir file points to non-existent location
[BASE-v1.7.0] running: treehouse get
[BASE-v1.7.0] exit code: 1
[BASE-v1.7.0] RESULT: WEDGED - worktree not created
[BASE-v1.7.0] relevant get output:
    failed to create worktree: git worktree add --detach .../1/myrepo refs/remotes/origin/main: fatal: '.../1/myrepo' is a missing but already registered worktree;

########## TARGET (fixed) ##########
[TARGET-FIXED] pool dir: /tmp/tmp.bGoIKj1MD6/home/.treehouse/myrepo-45755e
[TARGET-FIXED] git worktree list --porcelain BEFORE get (filtered):
    worktree /tmp/tmp.bGoIKj1MD6/myrepo
    worktree /tmp/tmp.bGoIKj1MD6/home/.treehouse/myrepo-45755e/1/myrepo
    prunable gitdir file points to non-existent location
[TARGET-FIXED] running: treehouse get
[TARGET-FIXED] exit code: 0
[TARGET-FIXED] RESULT: RECOVERED - worktree dir exists at .../1/myrepo
[TARGET-FIXED] relevant get output:
    🌳 Entered worktree at ~/.treehouse/myrepo-45755e/1/myrepo. Type 'exit' to return.
Regression test transcript
=== RUN   TestGetRecoversFromStaleWorktreeRegistration
--- PASS: TestGetRecoversFromStaleWorktreeRegistration (0.15s)
PASS
ok  	github.com/kunchenguid/treehouse/cmd	1.099s

Validated through the no-mistakes pipeline: intent, rebase, review, test, document, lint, push all passed with zero findings.

AI disclosure

Human-reviewed. The investigation (code-path analysis, deterministic reproduction), the fix design (Option A: prune-before-add, with the rejected Option B documented), and the regression test were produced autonomously and then reviewed by a human operator before opening. The fix is small and the safety argument (prune is safe-by-design) is stated explicitly above for review.

e-jung added 2 commits June 20, 2026 18:52
A crashed or forcibly removed worktree leaves prunable bookkeeping in
.git/worktrees/. The next "treehouse get" then fails hard with
"missing but already registered worktree" because git refuses the add,
wedging every subsequent spawn until a human runs "git worktree prune"
by hand.

Run "git worktree prune" immediately before "git worktree add" in
Acquire's create-new-worktree branch. Prune is safe by design: it only
removes registrations whose target directories are already gone, so it
cannot destroy live worktrees or data.

Adds TestGetRecoversFromStaleWorktreeRegistration covering the recovery.

Refs kunchenguid#30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant