Skip to content

fix(docker): resolve docker by absolute path on macOS spawn; honest docker_available (#696)#697

Merged
Dumbris merged 2 commits into
mainfrom
fix/mcp-2715-docker-spawn-path
Jun 16, 2026
Merged

fix(docker): resolve docker by absolute path on macOS spawn; honest docker_available (#696)#697
Dumbris merged 2 commits into
mainfrom
fix/mcp-2715-docker-spawn-path

Conversation

@Dumbris

@Dumbris Dumbris commented Jun 16, 2026

Copy link
Copy Markdown
Member

Problem (GitHub #696)

On macOS with Docker Desktop installed the default way (without the optional, admin-gated "install CLI tools" step), the docker CLI lives only at /Applications/Docker.app/Contents/Resources/bin/dockernot on any standard PATH dir. mcpproxy spawned Docker-isolated upstream servers with bare docker on a sanitized PATH that omits the bundle dir, so docker was unresolvable at spawn:

zsh:1: command not found: docker   (repeated ~20x, retry_count: 21)

…while upstream_servers list misleadingly reported docker_status.available: true / per-server docker_available: true.

Fixes (TDD — failing test first for each)

  1. Spawn by absolute path (root cause)internal/upstream/core/connection_docker.go: setupDockerIsolation resolves docker via shellwrap.ResolveDockerPath (mirroring newDockerCmd) and spawns the absolute path before shell-wrapping; falls back to bare docker only when resolution fails. Indirected via a package var so tests stub resolution with no real Docker install.
  2. Enhanced spawn PATH (defense in depth)internal/secureenv/manager.go: add /Applications/Docker.app/Contents/Resources/bin on darwin (existence-filtered). Candidate list split into a pure, unit-testable unixCandidatePaths().
  3. Honest docker_availableinternal/server/mcp.go (resolveDockerStatus) + internal/runtime/runtime.go (IsDockerAvailable) + internal/management/diagnostics.go: report available only when the CLI is resolvable and docker info succeeds — no bare-docker fallback that misreports. New docker_status.docker_path exposes the resolved binary; available:false with docker_path:"" means unresolvable.
  4. Actionable error + stderr dedupinternal/upstream/core/monitoring.go: formatRecentStderr leads a single actionable "Docker CLI not found" hint on command not found (zsh + bash forms) and collapses identical consecutive lines into … (repeated N×), replacing the 20-line wall.

Cross-platform

macOS-specific paths guarded behind runtime.GOOS=="darwin"; Linux/Windows unaffected. Shell-script fake-docker tests skip on Windows.

Docs (ENG-9)

docs/docker-isolation.md: new docker_path field, corrected telemetry-counter semantics, and a command not found: docker troubleshooting entry.

Verification

  • go build ./cmd/mcpproxy (personal) and go build -tags server ./cmd/mcpproxy — both pass.
  • New tests pass: internal/secureenv (bundle-path inclusion), internal/upstream/core (absolute-path spawn, cidfile insertion, bare fallback, hint extraction, dedup, wall→hint), internal/server (unavailable/working/daemon-down).
  • golangci-lint (CI v2 config) — 0 issues.
  • Full -race over touched packages running in CI.

Out of scope

  • Windows/Linux Docker Desktop bundle-path discovery (TODO).
  • MaxConnectionRetries unchanged.

Related #696

…ocker_available (#696)

On macOS, Docker Desktop installed the default way (without the optional,
admin-gated "install CLI tools" step) leaves the docker CLI only inside the
app bundle at /Applications/Docker.app/Contents/Resources/bin/docker, which is
not on any standard PATH dir. mcpproxy spawned isolated servers with bare
`docker` on a sanitized PATH that omits the bundle dir, so docker was
unresolvable at spawn (~20x `command not found: docker`), while
`upstream_servers list` misreported `docker_available: true`.

- Spawn (root cause): setupDockerIsolation resolves docker to an absolute path
  via shellwrap.ResolveDockerPath (mirroring newDockerCmd) before shell-wrapping;
  falls back to bare "docker" only when resolution fails.
- Enhanced spawn PATH (defense in depth): add the Docker Desktop bundle bin dir
  on darwin; split the candidate list into a pure, unit-testable function.
- Honest availability: MCPProxyServer.resolveDockerStatus and
  Runtime.IsDockerAvailable report available only when the CLI is resolvable AND
  `docker info` succeeds — no bare-docker fallback that would misreport. Surface
  the resolved binary as docker_status.docker_path; diagnostics.checkDockerDaemon
  reports unavailable with an actionable error when unresolvable.
- Diagnostics: formatRecentStderr leads a single actionable "Docker CLI not
  found" hint on `command not found` and collapses identical consecutive lines
  into "… (repeated N×)", replacing the 20-line wall.

macOS-specific paths guarded behind runtime.GOOS=="darwin"; Linux/Windows
unaffected. Docs updated for docker_path, telemetry counter semantics, and a
new troubleshooting entry.

Related #696
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploying mcpproxy-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6fa8ffa
Status: ✅  Deploy successful!
Preview URL: https://ba4c1d5a.mcpproxy-docs.pages.dev
Branch Preview URL: https://fix-mcp-2715-docker-spawn-pa.mcpproxy-docs.pages.dev

View logs

@codecov-commenter

codecov-commenter commented Jun 16, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 78.26087% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/upstream/core/connection_docker.go 56.25% 5 Missing and 2 partials ⚠️
internal/management/diagnostics.go 0.00% 4 Missing ⚠️
internal/server/mcp.go 80.95% 4 Missing ⚠️
internal/runtime/runtime.go 62.50% 2 Missing and 1 partial ⚠️
internal/secureenv/manager.go 80.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…n test

insertCidfileIntoShellDockerCommand checked shellArgs[n-2]=="-c" to
distinguish a shell-wrapped command string from an unrecognised format.
On Windows with cmd.exe (no bash), WrapWithUserShell returns ["/c", cmdStr]
so the flag is "/c", not "-c"; the guard rejected it and silently skipped
cidfile insertion, breaking TestSetupDockerIsolationCidfileInsertionWithAbsolutePath
on windows-latest CI.

Accept both flags ("-c" and "/c") so cidfile insertion works on every
platform. Add TestInsertCidfileWindowsCmdFormat (pure unit, no real Docker)
to pin the Windows cmd format.
@github-actions

Copy link
Copy Markdown

📦 Build Artifacts

Workflow Run: View Run
Branch: fix/mcp-2715-docker-spawn-path

Available Artifacts

  • archive-darwin-amd64 (28 MB)
  • archive-darwin-arm64 (25 MB)
  • archive-linux-amd64 (16 MB)
  • archive-linux-arm64 (14 MB)
  • archive-windows-amd64 (28 MB)
  • archive-windows-arm64 (25 MB)
  • frontend-dist-pr (0 MB)
  • installer-dmg-darwin-amd64 (21 MB)
  • installer-dmg-darwin-arm64 (19 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page linked above
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 27625094138 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

@mcpproxy-gatekeeper mcpproxy-gatekeeper Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved via Claude Code review (Codex quota exhausted).

@Dumbris Dumbris merged commit 79849e0 into main Jun 16, 2026
39 checks passed
Dumbris added a commit that referenced this pull request Jun 17, 2026
…rPath (#696)

Minimal #696 fix per board decision (MCP-2744): drop the direct-exec spawn
migration and all DOCKER_* env experiments; keep only the negative-cache fix.

Root cause of 'status shows docker_path but spawn used bare docker': under a
LaunchAgent/sandbox the first resolution can fail because only the restricted
login-shell leg failed, caching a negative for dockerPathNegativeTTL=60s. Early
upstream spawns hit that window and fall back to bare 'docker'. The well-known
-path probe is a pure os.Stat (never sandbox- or login-shell-restricted), so a
live cached negative must not shadow a docker binary present at a well-known
path right now. ResolveDockerPath now re-runs the cheap os.Stat probe before
honoring a still-live negative and, on success, upgrades the cache to a
permanent success.

With the cache fixed, the existing post-#697 shell wrap runs the resolved
ABSOLUTE docker path as the command token, so login-shell PATH is irrelevant
and #696 is fixed with no Colima regression and no env handling change. The
direct-exec migration + login-shell env hydration are deferred to a separate
systematic task.

TDD: TestResolveDockerPath_StatProbeOverridesLiveNegative asserts a successful
os.Stat well-known probe wins over a still-live cached negative (no TTL expiry).

Related #696

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Dumbris added a commit that referenced this pull request Jun 17, 2026
…rPath (#696) (#699)

Minimal #696 fix per board decision (MCP-2744): drop the direct-exec spawn
migration and all DOCKER_* env experiments; keep only the negative-cache fix.

Root cause of 'status shows docker_path but spawn used bare docker': under a
LaunchAgent/sandbox the first resolution can fail because only the restricted
login-shell leg failed, caching a negative for dockerPathNegativeTTL=60s. Early
upstream spawns hit that window and fall back to bare 'docker'. The well-known
-path probe is a pure os.Stat (never sandbox- or login-shell-restricted), so a
live cached negative must not shadow a docker binary present at a well-known
path right now. ResolveDockerPath now re-runs the cheap os.Stat probe before
honoring a still-live negative and, on success, upgrades the cache to a
permanent success.

With the cache fixed, the existing post-#697 shell wrap runs the resolved
ABSOLUTE docker path as the command token, so login-shell PATH is irrelevant
and #696 is fixed with no Colima regression and no env handling change. The
direct-exec migration + login-shell env hydration are deferred to a separate
systematic task.

TDD: TestResolveDockerPath_StatProbeOverridesLiveNegative asserts a successful
os.Stat well-known probe wins over a still-live cached negative (no TTL expiry).

Related #696

Co-authored-by: Paperclip <noreply@paperclip.ing>
Dumbris added a commit that referenced this pull request Jun 17, 2026
…aunchd) — MCP-2751 (#701)

* feat(shellwrap): hydrate login-shell env once at startup (macOS GUI/launchd)

When mcpproxy is launched from a macOS GUI/launchd context (Launchpad, the
SMAppService login item, or the tray spawning the core), the Go core inherits a
launchd-minimal environment: it never sources the user's login shell, so it
lacks Homebrew/Docker PATH entries and exported vars like DOCKER_HOST. This is
the root-cause class behind the recent docker/PATH point-fixes (#697, #699).

Add a one-time, allow-listed login-shell environment hydration at startup so
every spawn path (docker lifecycle, stdio servers, uvx/npx, ResolveDockerPath,
secureenv.BuildSecureEnvironment) inherits a correct environment with no
call-site changes:

- internal/shellwrap/hydrate.go: captureLoginShellEnv() sources `$SHELL -l -c
  'env -0'` once (NUL-delimited, marker-bracketed, 5s timeout, sync.Once) and
  caches the parsed env. LoginShellPATH now reads PATH from this shared capture
  — one shell fork, not two.
- HydrateFromLoginShell() gates on macOS + launchd-minimal PATH (no-op on
  terminal launches and non-macOS), merges PATH login-first, and applies a
  curated allow-list (DOCKER_*, HTTP(S)/NO_PROXY, NVM_DIR/ASDF_DIR/PYENV_ROOT/
  VOLTA_HOME/HOMEBREW_PREFIX/COLIMA_HOME) set-if-empty via os.Setenv. Secrets
  are never imported; HOME/USER/SHELL are never touched; values are never
  logged (key names + lengths only).
- secureenv: extend the default allow-list with the curated DOCKER_*/proxy/
  tool-home keys so the hydrated vars survive the filter into upstream spawns.
  Scanner MinimalEnv stays narrow (credential-free).
- cmd/mcpproxy: call HydrateFromLoginShell after logger setup, before any
  manager reads os.Environ().

Supersedes the per-spawn DOCKER_* env capture; the absolute-path docker probe
(#697) and negative-cache fix (#699) remain as complementary fallbacks.

Related #699

* fix(shellwrap): preserve intentionally-empty operator vars during hydration

Codex review (PR #701, P2): the curated-key set-if-empty guard used
`os.Getenv(key) != ""`, which treats an explicitly set-empty value the same
as unset. An operator who sets `HTTPS_PROXY=` or `DOCKER_HOST=` to DISABLE an
inherited value would have it overwritten from the login shell — violating the
never-clobber-operator-values contract.

Use os.LookupEnv to distinguish unset from intentionally-empty: any present
key (even empty) is operator intent and is preserved. launchd never sets these
to empty, so present-but-empty is always deliberate. Adds
TestHydrate_PreservesIntentionallyEmptyVar.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(shellwrap): alias-aware set-if-unset for proxy var spellings

Codex round-2 (PR #701, P2): proxy vars (HTTP_PROXY/HTTPS_PROXY/NO_PROXY) are
case-insensitive to HTTP clients, so the per-key LookupEnv guard could still
clobber operator intent — e.g. operator sets `https_proxy=` (lowercase, to
disable) but the login shell exports `HTTPS_PROXY=...`; the exact-key check on
the uppercase spelling missed the lowercase presence and imported it anyway.

Split the curated allow-list into single-spelling keys (DOCKER_*, tool-home)
and an alias-aware proxy trio: for each logical proxy var, if EITHER spelling
is already present in the process env (including intentionally empty), skip
hydrating BOTH spellings. Adds TestHydrate_ProxyCaseAliasingPreservesOperatorIntent
(daemon https_proxy='' + login HTTPS_PROXY=http://x → neither imported).

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(shellwrap): drop proxy vars, fix launch gate for pre-seeded PATH

Codex round-3 scope reduction:
- Remove HTTP_PROXY/HTTPS_PROXY/NO_PROXY from hydration and from
  secureenv DefaultEnvConfig allow-list. Proxy forwarding to every
  stdio upstream is out of scope; filed as a separate opt-in follow-up.
- Fix the launch gate: hydration now triggers when PATH looks launchd-
  minimal OR any DOCKER_* curated var is absent. A GUI launcher can
  pre-seed PATH via /etc/paths yet still not export DOCKER_HOST from
  rc files, so PATH-comprehensiveness alone is insufficient.
- PATH-merge stays gated on pathIsMinimal only — a comprehensive PATH
  is not widened even when the DOCKER_* gate fires.

Tests:
- Rename GateNoOpOnComprehensivePath → GateNoOpOnComprehensivePathAndAllDockerPresent
  (now requires both conditions, so all 5 DOCKER_* vars are pre-set)
- Add ComprehensivePathStillHydratesDockerIfMissing — verifies the new
  secondary gate: comprehensive PATH, missing DOCKER_HOST → hydrates
  curated vars but leaves PATH unmodified
- Drop PreservesIntentionallyEmptyVar and ProxyCaseAliasingPreservesOperatorIntent
  (proxy vars are no longer hydrated)
- Update MinimalPathHydratesCuratedSet to assert proxy vars are NOT imported

Co-Authored-By: Paperclip <noreply@paperclip.ing>

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Dumbris added a commit that referenced this pull request Jun 17, 2026
…CP-2753) (#703)

* fix(docker): direct-exec resolved docker binary for isolated spawn (MCP-2753)

PR #697 resolved the docker CLI to an absolute path but still routed the
isolated spawn through `$SHELL -l -c "<docker> run …"`, where the login shell
re-derives $PATH from rc files and can drop the Docker Desktop bundle dir. The
absolute path was only a token in the -c string, so docker-isolated servers
still failed with `command not found: docker` when the CLI was not on the
LaunchAgent/tray spawn PATH — even though `upstream_servers list` reported a
good docker_path.

On successful resolution to a VERIFIED absolute executable, setupDockerIsolation
now returns the resolved binary as the exec command with raw docker argv (no
shell wrap), mirroring the management path's newDockerCmd. It returns a
shellWrapped flag so callers pick the matching cidfile helper: the new
args-based insertCidfileIntoDockerArgs inserts `--cidfile` after the `run`
token (platform-agnostic, sidestepping the -c vs /c quirk), while the
login-shell fallback keeps the string-based insertCidfileIntoShellDockerCommand.

The direct-exec branch is gated by isDirectExecutable (filepath.IsAbs + os.Stat
+ exec-bit). ResolveDockerPath's last resort runs `command -v docker` in the
login shell, which can emit a shell function/alias/bare name; such a value is
rejected and shell-wrapped instead of failing a direct exec. Login-shell wrap of
bare `docker` is also kept on the resolution-failure fallback.

The DOCKER_* daemon-env half of MCP-2753 is already satisfied by the startup
login-shell hydration shipped in MCP-2751 (#701): DOCKER_* are hydrated into
os.Environ() and carried through BuildSecureEnvironment's default allow-list, so
the direct-exec'd docker process still reaches Colima/rootless/remote daemons
without a per-spawn shell capture.

Related #699
Related #696

* fix(docker): gate direct-exec on guaranteed daemon env (non-Darwin rootless regression)

Codex round-4 (PR #703 REQUEST_CHANGES, P2): the direct-exec branch regressed
non-Darwin rootless/remote Docker. Dropping the `$SHELL -l` wrap also drops
rc-file env inheritance, and HydrateFromLoginShell (MCP-2751) is Darwin-only
while BuildSecureEnvironment only forwards DOCKER_* already in os.Environ(). So
on Linux a daemon whose DOCKER_HOST/DOCKER_CONTEXT lives only in .profile would
be lost by direct-exec — the same DOCKER_* loss that made #699 keep the shell
wrap.

Add a second precondition (dockerDaemonEnvGuaranteed) alongside the verified-
absolute-executable check: direct-exec only when the daemon env is guaranteed
without the login shell — macOS (startup hydration captured DOCKER_* into
os.Environ()) OR DOCKER_HOST/DOCKER_CONTEXT already present in the process env on
any platform. Otherwise keep the `$SHELL -l` wrap so `docker run` inherits the
rc-file daemon config; the wrap now prefers the resolved absolute path (still
sidestepping the login shell's PATH re-derivation) and drops to bare "docker"
only when resolution failed.

dockerDaemonEnvGOOS is a package var so the Darwin branch is testable on a
non-Darwin CI host. New tests: non-Darwin + DOCKER_HOST-only-in-rc stays
shell-wrapped with the absolute path; non-Darwin + DOCKER_HOST in env direct-
execs; and a table test for the gate.

Related #703
Related #699

Co-Authored-By: Paperclip <noreply@paperclip.ing>

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants