codex-shell: AGENT_MODE=auth-init for slot OAuth provisioning (WOVED-126 + WOVED-128)#19
Conversation
…126 + WOVED-128) Successor to closed PR #18 (auto-closed when its base branch feat/woved-126-worker-mode merged via #17). Adds the third entrypoint mode — one-shot init pod that drives `claude /login` under operator supervision via the woveD Manager callback API. Lifecycle: 1. Spawn `claude` under a PTY (Claude Code's OAuth flow expects a TTY). 2. Watch stdout for the OAuth device-code URL. 3. POST URL + best-effort user_code to Manager: POST /slots/<SLOT_ID>/auth-init/url with X-Slot-Init-Token header (WOVED-128). 4. Long-poll Manager for the operator-submitted code: GET /slots/<SLOT_ID>/auth-init/code also with X-Slot-Init-Token. 2s backoff, 30min cap. 5. Pipe the code into the running CLI's PTY. 6. Wait for agent exit. Verify ~/.claude/ has auth state. Exit 0. WOVED-128 auth: every callback request includes the per-slot bearer token in the X-Slot-Init-Token header. Token is generated by the Manager when the init Pod is spawned and injected as the WOVED_SLOT_INIT_TOKEN env. A sibling pod that can reach the Manager service can NOT poll another slot's URL or consume its code without the matching token. The script also fails fast if the env var is missing (validated alongside the other required env vars). Pairs with woved#52 (Manager-side SlotAuthStore + callback endpoints + WOVED-128 token authn) and woved#55 (Spawner.init_slot — needs a small follow-up commit to actually generate + register the token when spawning the init Pod). What lands: - bin/auth_init.py — stdlib-only Python (pty, select, urllib) PTY- driven OAuth dance with token-authenticated callbacks - bin/entrypoint.sh — third case branch (auth-init); error message on unknown mode now lists all three options - Dockerfile — COPY bin/auth_init.py into the image First-draft caveats (TODOs in the code) — `claude /login`'s exact CLI shape + stdout patterns may need adjustment after a real-pod test pass: - Whether `claude` auto-prompts OAuth on no-auth-state startup, or requires `/login` typed into the REPL - Exact format of the device-code URL line in stdout (regex is lenient by design) - Exact filename(s) Claude Code writes under `~/.claude/` that indicate successful login The script's structure (PTY spawn, regex extraction, token-authn callback round-trip, code injection, exit verification) is the part worth reviewing now. The exact CLI mechanics will firm up once we run it against a live pod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codex-prodromou
left a comment
There was a problem hiding this comment.
Blocking on auth verification. Syntax checks passed (python3 -m py_compile bin/auth_init.py and bash -n bin/entrypoint.sh), but the success check can produce a false positive in the actual entrypoint flow.
Finding: bin/auth_init.py:321-329 treats any non-empty file or directory under ~/.claude as proof that OAuth state landed. The entrypoint pre-populates that same directory before the AGENT_MODE=auth-init switch: it creates ~/.claude, copies /etc/claude-defaults and /etc/claude-config, links CLAUDE.md, and installs skills (bin/entrypoint.sh:90-180). So an auth-init pod can report success after claude exits 0 even if no credential file was written, because pre-existing config files/directories satisfy _verify_auth_landed().
Please verify a known Claude credential artifact, or snapshot ~/.claude before _drive_login() and require a new/changed auth file after login.
Plane handoff: WOVED-131.
…ults
Codex P1 cross-review of codex-shell#19: `_verify_auth_landed()`
treated any non-empty file or directory under ~/.claude/ as
successful auth. The entrypoint pre-populates that directory from
defaults/config + the agent-config CLAUDE.md symlink BEFORE
AGENT_MODE=auth-init runs, so an init pod could report success
even when claude exited without actually writing credentials.
Fix: snapshot-diff. `_snapshot_claude_dir()` walks ~/.claude/ and
returns {relpath: (size, mtime)}; main() takes a `before` snapshot
right after env validation, runs the login dance, takes an `after`
snapshot, and `_verify_new_auth_artifacts(before, after)` returns
True iff the after-set has new files OR existing files with
changed size/mtime.
Symlinks excluded from the snapshot — CLAUDE.md is a stable symlink
to agent-config that would otherwise show false differences across
runs (mtime jitters when the entrypoint re-runs the agent-config
clone).
This is robust to the WOVED-126 TODO uncertainty around exact
Claude Code credential filenames: ANYTHING new or modified after
the login dance counts as success, no need to hardcode filenames
that may drift across CLI versions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed WOVED-131 in 534a00f: `_verify_auth_landed()` was too permissive — codex caught that the entrypoint pre-populates `~/.claude/` from image defaults + the agent-config CLAUDE.md symlink BEFORE `AGENT_MODE=auth-init` runs, so a naive "any file present = success" check would false-positive even when claude exited without writing real credentials. Fix: snapshot-diff.
This is robust to the WOVED-126 TODO uncertainty around exact Claude Code credential filenames — anything new or modified after the login dance counts as success, no hardcoded filenames that could drift across CLI versions. |
codex-prodromou
left a comment
There was a problem hiding this comment.
Approved. The WOVED-131 false-positive is addressed by snapshotting ~/.claude before/after the login dance and requiring a new or modified non-symlink file.
Checks run:
python3 -m py_compile bin/auth_init.pybash -n bin/entrypoint.shgit diff --check origin/main...HEAD
Summary
Successor to closed PR #18 (auto-closed when its base branch `feat/woved-126-worker-mode` merged via #17). Adds the third entrypoint mode — one-shot init pod that drives `claude /login` under operator supervision via the woveD Manager callback API.
This version is rebased on `main` (which already has `AGENT_MODE=worker` from #17) and includes the WOVED-128 per-slot bearer-token authn that landed on woved#52 in parallel.
Lifecycle
WOVED-128 token authn
Every callback request includes the per-slot bearer token in the `X-Slot-Init-Token` header. Token is generated by the Manager when the init Pod is spawned and injected as the `WOVED_SLOT_INIT_TOKEN` env. A sibling pod that can reach the Manager service can NOT poll another slot's URL or consume its code without the matching token. The script fails fast if the env var is missing.
Pairs with
What lands
First-draft caveats (TODOs)
`claude /login`'s exact CLI shape + stdout patterns may need adjustment after a real-pod test pass. `TODO(WOVED-126)` markers in the code call out the parts most likely to need iteration:
The script's structure (PTY spawn, regex extraction, token-authn callback round-trip, code injection, exit verification) is the part worth reviewing now.
Test plan
Canonical design
Confluence page 65961985 (WOVED-126 + WOVED-128 addendum).
🤖 Generated with Claude Code