feat(registries): harden registry fetches against SSRF (CWE-918, MCP-1076)#745
Conversation
…1076) Registry server-list fetches use a user-supplied URL (`registry add-source`), which CodeQL flags as go/request-forgery (#97/#98): a malicious or typo'd registry source could point the daemon at internal/metadata endpoints (169.254.169.254, RFC1918 ranges). Add an SSRF guard centered on a single `isBlockedIP` predicate (loopback, RFC1918, RFC6598 CGNAT, link-local incl. the cloud-metadata endpoint, IPv6 unique-local, unspecified, multicast; fails closed), applied at three layers: - dial time: a net.Dialer Control hook on the shared registry HTTP client rejects the actual resolved IP before connect — covers hostnames that resolve into blocked ranges and closes the DNS-rebinding TOCTOU window a parse-time check alone cannot. - fetch pre-flight: validateRegistryURL rejects a literal-IP host (defense in depth + an explicit barrier at the http.NewRequest sink). - add-source/edit-source: literal-IP fail-fast (no DNS, stays pure/offline) so a bad source is refused up front with the stable invalid_registry_url code. Opt out with a new top-level `allow_private_registry_fetch` config flag (default false) for operators who run a trusted registry mirror on an internal/private address. TDD: table tests for every blocked/allowed range (incl. CGNAT and RFC1918 boundaries), an end-to-end loopback dial block, add-source literal-IP rejection, and the two registry-add E2E daemon tests opt in via the new flag.
Deploying mcpproxy-docs with
|
| Latest commit: |
1ae8c20
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://d4c1ef35.mcpproxy-docs.pages.dev |
| Branch Preview URL: | https://security-mcp-1076-registry-s.mcpproxy-docs.pages.dev |
The in-process server tests that fetch a registry served on a loopback httptest server (consistency_official, mcp_add_from_registry's startTestRegistry helper) now hit the new SSRF dial guard, because the registries-package test bypass does not apply to the server package's test binary. Set AllowPrivateRegistryFetch on their SetRegistriesFromConfig configs — exactly how an operator opts in for a trusted internal mirror — so the fetch is permitted while the production guard stays active for the add-source rejection test. Fixes the linux/arm64 Build Binaries leg (TestCrossSurfaceConsistency_*, TestHandleUpstreamServers_AddFromRegistry_*).
|
Changes requested — SSRF guard blocks loopback, breaking |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
📦 Build ArtifactsWorkflow Run: View Run Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 27937618138 --repo smart-mcp-proxy/mcpproxy-go
|
CodexReviewer (round 2) found a real gap: the registry client clones http.DefaultTransport, which keeps Proxy: ProxyFromEnvironment. With HTTP_PROXY/HTTPS_PROXY set, the dialer connects to the PROXY as the first hop, so registryDialControl only ever validates the proxy's IP — never the real target. A hostname resolving to loopback/private/link-local (incl. 169.254.169.254) could still be fetched through a proxy. Add an application-layer guard (guardRegistryTargetHost) that resolves the target host and rejects the fetch if ANY resolved IP is in a blocked range. It runs before the request in registryGet, independent of the transport/proxy, so it holds in the proxy case; the dial-time Control stays as defense-in-depth for the direct path and DNS-rebind TOCTOU. Fail-open on resolver error (a flaky resolver must not break every fetch); literal IPs already covered pre-flight; relaxed by the allow_private_registry_fetch opt-in. Regression coverage: a public-looking hostname resolving to 169.254.169.254 is refused with HTTP(S)_PROXY set; plus direct predicate tests (mixed public+private resolution blocked, all-public allowed, resolver-error fail-open, public literal allowed). Resolver is injectable via a package var for deterministic, network-free tests.
There was a problem hiding this comment.
✅ Gatekeeper approval — MCP-1076 SSRF hardening. KimiReviewer verdict ACCEPT on head 1ae8c20 (model-diverse): three-layer defense (literal-IP pre-flight + app-layer DNS resolution that holds under HTTP_PROXY + dial-time control for DNS-rebinding), with a proxy-bypass regression test covering exactly the path CodexReviewer flagged in round 1. CI fully green (arm64 + Unit + E2E + CodeQL). Deny-by-default; opt-in via allow_private_registry_fetch. Author (BackendEngineer) ≠ approver.
The allow_private_registry_fetch flag is all-or-nothing: enabling it lifts the registry SSRF guard for every non-routable range at once, including the 169.254.169.254 cloud-metadata endpoint. Enabling it for a localhost dev registry therefore also re-opens the cloud-metadata SSRF vector. Document this sharp edge in the config doc comment, docs/configuration.md, and docs/registries.md, plus note the flag only takes effect on daemon (re)start / config reload. Related #745 Related MCP-3206
What
Hardens registry URL fetches against SSRF / CodeQL
go/request-forgery(alerts #97/#98, MCP-1076). Registry server-list fetches use a user-supplied URL (registry add-source), so a malicious or typo'd source could point the daemon at internal/metadata endpoints (169.254.169.254, RFC1918 ranges).How
A single
isBlockedIPpredicate (newinternal/registries/ssrf.go) is the source of truth for blocked ranges: loopback, RFC1918 private, RFC6598 CGNAT (100.64/10), link-local incl. the cloud-metadata endpoint, IPv6 unique-local, unspecified, multicast — fails closed on an unparseable IP. It is enforced at three layers:net.Dialer.Controlhook on the shared registry HTTP client checks the actual resolved IP before connect. This is the authoritative guard: it catches hostnames that resolve into blocked ranges and closes the DNS-rebinding TOCTOU window a parse-time check alone cannot.validateRegistryURLrejects a literal-IP host (defense in depth + an explicit barrier at thehttp.NewRequestsink, alongside the existing scheme/host pin).invalid_registry_urlcode.Opt-out
New top-level
allow_private_registry_fetchconfig flag (default false = secure). Settrueonly to run a trusted registry mirror on an internal/private address. Wired throughSetRegistriesFromConfigso config hot-reload keeps the dial guard in sync. This is the "optional allow/deny policy" called for in the issue.Tests (TDD)
TestIsBlockedIP— every blocked range + public allow cases incl. RFC1918/CGNAT boundaries + IPv6 + fail-closed nil.TestHostLiteralBlocked,TestValidateRegistrySourceURL— literal-IP rejection, hostname pass-through, opt-in bypass.TestRegistryGet_BlocksLoopbackWhenGuardActive— end-to-end: guarded fetch refused at dial time, endpoint never reached.TestBuildRegistrySourceEntry_RejectsSSRFLiteralIP— add-source fail-fast.Verified locally:
go test ./internal/registries/... ./internal/config/... -race, registry-add E2E daemon tests, add/edit-source unit tests, golangci-lint v2 (.github/.golangci.yml) 0 issues, gofmt,make swaggerregenerated.Notes
CodeQL #97/#98 are pre-existing alerts on
main(created by #572, surfaced — not introduced — by #594). This PR adds the mitigating guard; the alerts can be dismissed as mitigated once merged.Closes MCP-1076.