Pin agent CLI versions; harden managed-config sync (OPS-409, OPS-406) by nprodromou · Pull Request #10 · nprodromou/codex-shell

nprodromou · 2026-05-07T19:17:24Z

Summary

OPS-409: Pin @openai/codex (0.129.0) and @anthropic-ai/claude-code (2.1.132) via Dockerfile ARGs, record in image LABELs + ENV, so rebuilds of the same commit produce the same agent CLI behavior.
OPS-406: Replace cp -fL ... 2>/dev/null || true with cp -afL and exit FATAL on failure. Adds a smoke check that detects the non-empty-source / empty-destination case and fails the pod start instead of running with stale config.

Both originated as Codex P2 findings on PRs #2-#8.

Test plan

Image builds for both AGENT=codex and AGENT=claude
docker inspect <image> --format '{{ index .Config.Labels "com.prodromou.codex-shell.codex-cli-version" }}' returns the pinned version
Pod startup banner shows correct CLI version
Negative test: simulate empty AGENT_CONFIG_DIR after copy → entrypoint exits 1 with FATAL message
Positive test: deploy to apk8s agents namespace, confirm ConfigMap content (including subdirs) lands in /home/<agent>/.<agent>/

OPS-409: Pin @openai/codex and @anthropic-ai/claude-code to explicit versions via ARGs (CODEX_CLI_VERSION, CLAUDE_CLI_VERSION). Versions also recorded in image LABELs and exported as ENV so the runtime banner can confirm what shipped. Rebuilding the same commit no longer silently picks up a newer agent CLI. OPS-406: Replace `cp -fL ... 2>/dev/null || true` with `cp -afL` (-a recurses + preserves attrs, -L dereferences ConfigMap symlinks). Failures now exit with a clear FATAL message instead of being masked. Adds a smoke check: if the ConfigMap mount is non-empty but the destination ends up empty, fail loudly so stale managed config can no longer ride a successful pod start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codex-prodromou

No blocking findings. This addresses the two Codex findings from OPS-406 and OPS-409: agent CLI npm packages are pinned, image metadata records the versions, the ConfigMap copy is recursive/dereferencing, and failures no longer get hidden. Build matrix passes for both codex and claude images.

Codex review (CHANGES_REQUESTED): cp -fL without -R skipped subdirectories, so the baked defaults wouldn't actually land in the runtime config dir. Same root cause as OPS-406 (codex-shell#10). Applies the hardened pattern from nprodromou#10 to BOTH config-copy layers: Layer 1 — image defaults (/etc/<agent>-defaults/): cp -afL with FATAL exit on failure + smoke check that catches silent permission/path failures. Layer 2 — ConfigMap overlay (/etc/<agent>-config/): Same pattern. Will rebase cleanly on top of nprodromou#10 (or vice versa) since the changes are textually identical. Both layers now fail loudly instead of silently masking missing config — same defense-in-depth as the OPS-406 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) * OPS-405: bake codex runtime defaults into image, layered with ConfigMap Adds defaults/codex-config.toml carrying a sensible baseline for the codex variant: sandbox_mode = "danger-full-access" # pod is the security boundary approval_policy = "on-failure" # no per-command prompts [projects."/home/codex/workspace"] trust_level = "trusted" Why these values: the apk8s pod itself is the security sandbox (non-root user, restricted RBAC, PVC isolation). Codex's internal bubblewrap layer is redundant in this deployment AND was failing on `bwrap: No permissions to create new namespace` because most hardened k8s clusters block unprivileged user-namespace cloning. Disabling Codex's inner sandbox eliminates the per-command escalation that OPS-405 calls out as noisy. Dockerfile copies defaults/<agent>-config.toml into /etc/<agent>-defaults/ during build (only the file matching the AGENT arg gets installed). Entrypoint now layers two sources into ${AGENT_CONFIG_DIR}: Layer 1: /etc/<agent>-defaults/ — image baseline (this commit) Layer 2: /etc/<agent>-config/ — apk8s ConfigMap (existing, wins) Per-deployment tweaks still go in the apk8s ConfigMap; this baseline just means a fresh pod without a ConfigMap is still functional. Note on full fix scope: OPS-405's apk8s ConfigMap update remains a separate Nate-action — the live pods today have a ConfigMap mounted, which means this image-default doesn't reach them until either the pods are rebuilt without their ConfigMap or the ConfigMap content is updated to match this baseline. The values here can be copied into the apk8s ConfigMap directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * OPS-405: harden config-copy on both layers (cp -afL + smoke check) Codex review (CHANGES_REQUESTED): cp -fL without -R skipped subdirectories, so the baked defaults wouldn't actually land in the runtime config dir. Same root cause as OPS-406 (codex-shell#10). Applies the hardened pattern from #10 to BOTH config-copy layers: Layer 1 — image defaults (/etc/<agent>-defaults/): cp -afL with FATAL exit on failure + smoke check that catches silent permission/path failures. Layer 2 — ConfigMap overlay (/etc/<agent>-config/): Same pattern. Will rebase cleanly on top of #10 (or vice versa) since the changes are textually identical. Both layers now fail loudly instead of silently masking missing config — same defense-in-depth as the OPS-406 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codex-prodromou approved these changes May 7, 2026

View reviewed changes

claude-prodromou mentioned this pull request May 7, 2026

OPS-405: bake codex runtime defaults into image (sandbox + approval) #9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pin agent CLI versions; harden managed-config sync (OPS-409, OPS-406)#10

Pin agent CLI versions; harden managed-config sync (OPS-409, OPS-406)#10
nprodromou wants to merge 1 commit into
mainfrom
ops/pin-cli-versions-and-fix-config-copy

nprodromou commented May 7, 2026

Uh oh!

codex-prodromou left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nprodromou commented May 7, 2026

Summary

Test plan

Uh oh!

codex-prodromou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants