Pin agent CLI versions; harden managed-config sync (OPS-409, OPS-406)#10
Open
nprodromou wants to merge 1 commit into
Open
Pin agent CLI versions; harden managed-config sync (OPS-409, OPS-406)#10nprodromou wants to merge 1 commit into
nprodromou wants to merge 1 commit into
Conversation
OPS-409: Pin @openai/codex and @anthropic-ai/claude-code to explicit versions via ARGs (CODEX_CLI_VERSION, CLAUDE_CLI_VERSION). Versions also recorded in image LABELs and exported as ENV so the runtime banner can confirm what shipped. Rebuilding the same commit no longer silently picks up a newer agent CLI. OPS-406: Replace `cp -fL ... 2>/dev/null || true` with `cp -afL` (-a recurses + preserves attrs, -L dereferences ConfigMap symlinks). Failures now exit with a clear FATAL message instead of being masked. Adds a smoke check: if the ConfigMap mount is non-empty but the destination ends up empty, fail loudly so stale managed config can no longer ride a successful pod start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codex-prodromou
approved these changes
May 7, 2026
codex-prodromou
left a comment
Collaborator
There was a problem hiding this comment.
No blocking findings. This addresses the two Codex findings from OPS-406 and OPS-409: agent CLI npm packages are pinned, image metadata records the versions, the ConfigMap copy is recursive/dereferencing, and failures no longer get hidden. Build matrix passes for both codex and claude images.
claude-prodromou
added a commit
to claude-prodromou/codex-shell
that referenced
this pull request
May 7, 2026
Codex review (CHANGES_REQUESTED): cp -fL without -R skipped subdirectories, so the baked defaults wouldn't actually land in the runtime config dir. Same root cause as OPS-406 (codex-shell#10). Applies the hardened pattern from nprodromou#10 to BOTH config-copy layers: Layer 1 — image defaults (/etc/<agent>-defaults/): cp -afL with FATAL exit on failure + smoke check that catches silent permission/path failures. Layer 2 — ConfigMap overlay (/etc/<agent>-config/): Same pattern. Will rebase cleanly on top of nprodromou#10 (or vice versa) since the changes are textually identical. Both layers now fail loudly instead of silently masking missing config — same defense-in-depth as the OPS-406 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude-prodromou
added a commit
that referenced
this pull request
May 8, 2026
) * OPS-405: bake codex runtime defaults into image, layered with ConfigMap Adds defaults/codex-config.toml carrying a sensible baseline for the codex variant: sandbox_mode = "danger-full-access" # pod is the security boundary approval_policy = "on-failure" # no per-command prompts [projects."/home/codex/workspace"] trust_level = "trusted" Why these values: the apk8s pod itself is the security sandbox (non-root user, restricted RBAC, PVC isolation). Codex's internal bubblewrap layer is redundant in this deployment AND was failing on `bwrap: No permissions to create new namespace` because most hardened k8s clusters block unprivileged user-namespace cloning. Disabling Codex's inner sandbox eliminates the per-command escalation that OPS-405 calls out as noisy. Dockerfile copies defaults/<agent>-config.toml into /etc/<agent>-defaults/ during build (only the file matching the AGENT arg gets installed). Entrypoint now layers two sources into ${AGENT_CONFIG_DIR}: Layer 1: /etc/<agent>-defaults/ — image baseline (this commit) Layer 2: /etc/<agent>-config/ — apk8s ConfigMap (existing, wins) Per-deployment tweaks still go in the apk8s ConfigMap; this baseline just means a fresh pod without a ConfigMap is still functional. Note on full fix scope: OPS-405's apk8s ConfigMap update remains a separate Nate-action — the live pods today have a ConfigMap mounted, which means this image-default doesn't reach them until either the pods are rebuilt without their ConfigMap or the ConfigMap content is updated to match this baseline. The values here can be copied into the apk8s ConfigMap directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * OPS-405: harden config-copy on both layers (cp -afL + smoke check) Codex review (CHANGES_REQUESTED): cp -fL without -R skipped subdirectories, so the baked defaults wouldn't actually land in the runtime config dir. Same root cause as OPS-406 (codex-shell#10). Applies the hardened pattern from #10 to BOTH config-copy layers: Layer 1 — image defaults (/etc/<agent>-defaults/): cp -afL with FATAL exit on failure + smoke check that catches silent permission/path failures. Layer 2 — ConfigMap overlay (/etc/<agent>-config/): Same pattern. Will rebase cleanly on top of #10 (or vice versa) since the changes are textually identical. Both layers now fail loudly instead of silently masking missing config — same defense-in-depth as the OPS-406 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@openai/codex(0.129.0) and@anthropic-ai/claude-code(2.1.132) via Dockerfile ARGs, record in image LABELs + ENV, so rebuilds of the same commit produce the same agent CLI behavior.cp -fL ... 2>/dev/null || truewithcp -afLand exit FATAL on failure. Adds a smoke check that detects the non-empty-source / empty-destination case and fails the pod start instead of running with stale config.Both originated as Codex P2 findings on PRs #2-#8.
Test plan
AGENT=codexandAGENT=claudedocker inspect <image> --format '{{ index .Config.Labels "com.prodromou.codex-shell.codex-cli-version" }}'returns the pinned versionagentsnamespace, confirm ConfigMap content (including subdirs) lands in/home/<agent>/.<agent>/