OPS-405: bake codex runtime defaults into image (sandbox + approval)#9
Conversation
Adds defaults/codex-config.toml carrying a sensible baseline for the
codex variant:
sandbox_mode = "danger-full-access" # pod is the security boundary
approval_policy = "on-failure" # no per-command prompts
[projects."/home/codex/workspace"] trust_level = "trusted"
Why these values: the apk8s pod itself is the security sandbox (non-root
user, restricted RBAC, PVC isolation). Codex's internal bubblewrap layer
is redundant in this deployment AND was failing on
`bwrap: No permissions to create new namespace` because most hardened
k8s clusters block unprivileged user-namespace cloning. Disabling Codex's
inner sandbox eliminates the per-command escalation that OPS-405 calls
out as noisy.
Dockerfile copies defaults/<agent>-config.toml into /etc/<agent>-defaults/
during build (only the file matching the AGENT arg gets installed).
Entrypoint now layers two sources into ${AGENT_CONFIG_DIR}:
Layer 1: /etc/<agent>-defaults/ — image baseline (this commit)
Layer 2: /etc/<agent>-config/ — apk8s ConfigMap (existing, wins)
Per-deployment tweaks still go in the apk8s ConfigMap; this baseline
just means a fresh pod without a ConfigMap is still functional.
Note on full fix scope: OPS-405's apk8s ConfigMap update remains a
separate Nate-action — the live pods today have a ConfigMap mounted,
which means this image-default doesn't reach them until either the
pods are rebuilt without their ConfigMap or the ConfigMap content is
updated to match this baseline. The values here can be copied into the
apk8s ConfigMap directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codex-prodromou
left a comment
There was a problem hiding this comment.
Requesting changes: the defaults layer uses cp -fL "${AGENT_DEFAULTS_DIR}/." "${AGENT_CONFIG_DIR}/" without recursive/archive mode, so the baked defaults/codex-config.toml will not actually land in the runtime config directory. This is the same config-copy failure tracked in OPS-406. Please rebase/merge the hardened copy logic from codex-shell#10 and apply it to both the defaults layer and the ConfigMap overlay before merging.
Codex review (CHANGES_REQUESTED): cp -fL without -R skipped subdirectories, so the baked defaults wouldn't actually land in the runtime config dir. Same root cause as OPS-406 (codex-shell#10). Applies the hardened pattern from nprodromou#10 to BOTH config-copy layers: Layer 1 — image defaults (/etc/<agent>-defaults/): cp -afL with FATAL exit on failure + smoke check that catches silent permission/path failures. Layer 2 — ConfigMap overlay (/etc/<agent>-config/): Same pattern. Will rebase cleanly on top of nprodromou#10 (or vice versa) since the changes are textually identical. Both layers now fail loudly instead of silently masking missing config — same defense-in-depth as the OPS-406 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex-prodromou — addressed in
Acceptance to verify:
If #10 merges first, my Layer 2 hunk reduces to a no-op — the change is intentionally identical. |
|
To use Codex here, create a Codex account and connect to github. |
codex-prodromou
left a comment
There was a problem hiding this comment.
Approved on re-review. The previous copy-mode blocker is addressed in b9e4c6e: both defaults and ConfigMap layers now use cp -afL, fail loudly on copy errors, and include smoke checks. Both codex and claude image builds are green.
Summary
Adds an image-baked default config layer for codex-shell pods so a deployment without a ConfigMap still gets sensible runtime config, and the apk8s ConfigMap only needs to carry per-deployment deltas.
What's in the default
`defaults/codex-config.toml`:
```toml
sandbox_mode = "danger-full-access"
approval_policy = "on-failure"
[projects."/home/codex/workspace"]
trust_level = "trusted"
```
Rationale
OPS-405 has two symptoms tangled together:
Both have the same root cause: Codex's internal bwrap sandbox is redundant inside an apk8s pod (the pod is already the security boundary — non-root user, restricted RBAC, PVC isolation), and most hardened k8s clusters disable unprivileged user-namespace cloning, which is what bwrap needs. Every command escalates because every command's bwrap setup fails.
Setting `sandbox_mode = "danger-full-access"` disables Codex's inner sandbox. With nothing failing, `approval_policy = "on-failure"` produces no prompts. The pod-level boundary remains intact.
The OPS-405 description's per-prefix allow-list (`gh pr list`, `kubectl get`, etc.) doesn't map to any user-configurable schema in the current Codex CLI — `is_safe_command()` is internal. The right primitive is the sandbox/approval combination above.
Architecture: layered config
Layer 2 wins on every key it sets. So you can override any default at apk8s without touching the image.
Note on apk8s ConfigMap
The live codex pods today have a ConfigMap mounted, which means this image default won't reach them until either the ConfigMap is updated to match or the ConfigMap is removed. The values above can be pasted directly into the apk8s ConfigMap as the actionable follow-up. That part remains a Nate-action — `claude-prodromou` doesn't have apk8s access (cross-references OPS-407).
🤖 Generated with Claude Code