feat(proxy): per-request upstream override via X-Headroom-Upstream header#1199
Closed
JavaGT wants to merge 2 commits into
Closed
feat(proxy): per-request upstream override via X-Headroom-Upstream header#1199JavaGT wants to merge 2 commits into
JavaGT wants to merge 2 commits into
Conversation
…ader A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream. The caller tags each request with its real upstream base in the X-Headroom-Upstream header and the proxy forwards there, overriding the startup default for that provider. The value is normalized like *_TARGET_API_URL (trailing slash and trailing /v1 stripped) and the proxy appends the incoming request path, so both https://api.deepseek.com and https://api.deepseek.com/v1 resolve correctly. Honored on /v1/chat/completions, /v1/responses, /v1/messages, and every passthrough / catch-all route (handle_passthrough). The header is an x-headroom-* control flag and is stripped before the upstream call, so it never leaks to the provider. Enables single-proxy multi-provider setups such as OpenCode (each provider configured with baseURL = the proxy + X-Headroom-Upstream = its real upstream), avoiding the one-proxy-per-upstream fan-out.
Contributor
PR governanceThis PR does not yet satisfy the required template fields:
Please update the PR body, or move the PR back to draft while it is still in progress. |
…normalization Reviewing chopratejas#1089 (the launcher-side opencode wrap) surfaced two gaps in the initial override implementation: 1. Gemini's version segment is /v1beta (not /v1). The proxy's gemini handlers append /v1beta/models/... themselves, so a caller passing the versioned URL (e.g. matching headroom's _KNOWN_UPSTREAMS 'generativelanguage.googleapis.com/v1beta') would have produced a doubled /v1beta/v1beta path. The resolver now strips a trailing /v1beta as well as /v1. 2. The three /v1beta gemini routes did not thread the override through. handle_gemini_generate_content and handle_gemini_count_tokens already accepted upstream_base_url; handle_gemini_stream_generate_content did not. All three routes now pass request_upstream_override(request), and the stream handler gained the parameter. Coverage is now OpenAI (/v1/chat/completions, /v1/responses), Anthropic (/v1/messages), Gemini (/v1beta generateContent/streamGenerateContent/ countTokens), and every passthrough / catch-all route.
This was referenced Jun 20, 2026
Author
|
Closing — withdrawing this PR. Thanks for the review bot feedback. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream. The caller tags each request with its real upstream base in the
X-Headroom-Upstreamheader and the proxy forwards there, overriding the startup default for that provider.This is a proxy-side capability, orthogonal to the existing
headroom wrap opencodePRs (#1089, #1173), which approach multi-provider from the launcher side (one proxy per upstream). This PR makes the one-proxy, many-upstreams shape possible — no per-provider proxy instances, one port, one process.Today every provider that should be compressed needs either its own
OPENAI_TARGET_API_URL/ANTHROPIC_TARGET_API_URLenv var (one upstream per provider family), or its own proxy instance on its own port (the approach in #1089 — 6 proxies for 6 upstreams). For clients that can attach custom per-provider headers — OpenCode (options.headers), and any AI-SDK-based client — a single proxy can serve all of them: point every provider's base URL at the proxy and tag each request withX-Headroom-Upstream: <real upstream>. Auth stays on the client (Authorization/x-api-keypass through), so no API keys live in the proxy config.Closes # (none — feature, no tracking issue; related: #1193, #1089, #1173)
Type of Change
Changes Made
headroom/proxy/helpers.py—request_upstream_override()+_UPSTREAM_OVERRIDE_HEADER. ReadsX-Headroom-Upstreamand normalizes it the same way as*_TARGET_API_URL(trailing slash and a trailing/v1or/v1betaversion segment stripped), sohttps://api.deepseek.com,…/v1, andhttps://generativelanguage.googleapis.com/v1betaall resolve to their bare host. The proxy then appends the incoming request path.headroom/proxy/handlers/openai.py— read the override inhandle_openai_chat(/v1/chat/completions),handle_openai_responses(/v1/responses, non-ChatGPT-auth branch only), and at the top ofhandle_passthrough(covers all passthrough routes + the verbatim catch-all in one place).headroom/proxy/handlers/gemini.py—handle_gemini_stream_generate_contentgains theupstream_base_urlparameter; all three/v1betagemini routes thread the override.headroom/providers/proxy_routes.py— thread the override throughhandle_anthropic_messages's existingupstream_base_urlparam (/v1/messages), and through the three/v1betagemini routes.docs/content/docs/proxy.mdx— new Per-request upstream override section with the OpenCode config example.CHANGELOG.md— Unreleased Features entry.tests/test_proxy_upstream_override.py— 18 new tests (resolver normalization incl./v1beta,/v1/messagesthreading,/v1/chat/completionsURL build, gemini/v1betaroute threading, catch-all/passthrough URL build, header-stripping).tests/test_provider_proxy_routes.py— the generic delegate-swallowing fake now accepts**kwargs(it already swallowed*args), so the newupstream_base_url=kwarg on/v1/messagesflows through.Scope notes / non-goals
/v1/responsespath (_ws_http_fallback) and the ChatGPT-OAuth branch of/v1/responsesare intentionally not overridden — those are subscription-auth endpoints with a hardcodedchatgpt.comupstream, not the configured provider target. BYOK clients (OpenCode) use the API-key branch, which is covered./v1/messages/batches/*,/v1/batches/*,/v1beta/batches/*) readself.*_API_URLdirectly and are not overridden in this PR; they can be threaded in a follow-up if there's demand. The catch-all covers arbitrary unknown paths verbatim regardless.Testing
pytest)ruff checkon all changed files)mypyon all changed source files)Test Output
Real Behavior Proof
feat/proxy-upstream-override-header(commiteb6ab307), opencode v1.17.8. Live proxy started viapython -m headroom.cli proxy --port 8787 --no-optimize --no-cache --no-rate-limit(the--no-optimizeflag only avoids the model-loading pipeline delay in the test harness; the override code path is identical with optimization on, as proven by the unit/route tests).127.0.0.1:9999, started the real headroom proxy on127.0.0.1:8787, thencurl'd/v1/chat/completions+/v1/messages+ a catch-allGET /v1/modelswith theX-Headroom-Upstreamheader, and finally ran a realopencode run --model openai/mock-model "say hi"through the proxy. Full numbered detail below.127.0.0.1:9999(logs every request: method, path, presence ofx-headroom-upstream).127.0.0.1:8787.curl -X POST http://127.0.0.1:8787/v1/chat/completions -H 'X-Headroom-Upstream: http://127.0.0.1:9999' …— and the same for/v1/messagesand a catch-allGET /v1/models.options.baseURL = http://127.0.0.1:8787/v1andoptions.headers = { "X-Headroom-Upstream": "http://127.0.0.1:9999" }, thenopencode run --model openai/mock-model "say hi".x-headroom-upstreamheader before the upstream call (mock saw it absent); opencode rendered the response. Full detail below.event=outbound_request forwarder=server method=POST path=http://127.0.0.1:9999/v1/chat/completionsandevent=outbound_headers forwarder=openai_chat_completions stripped_count=1. Mock received the request at/v1/chat/completionswithx-headroom-upstreamabsent (stripped). Same verified for/v1/messagesand the catch-all/v1/models.POST /v1/responses(opencode uses the Responses API) with the header stripped; opencode rendered the response (hi). Full chain: opencode → (sends header + baseURL=proxy) → headroom proxy → (override → mock) → mock upstream./v1betapath is covered by unit + route tests and the normalization is unit-tested; a live Gemini key was not available in this environment). Codex WebSocket fallback path (intentionally not overridden — see non-goals).Review Readiness
Checklist
ruffclean)handle_passthroughchokepoint are documented)docs/content/docs/proxy.mdx+CHANGELOG.md)Attribution
See the Automated contribution banner at the very top of this PR. In short: this PR (code, tests, docs, this description, and the verification commands/results) was produced entirely by an AI coding agent. @JavaGT requested the improvement and authorized the submission, but did not write, review, run, or verify the code. The checklist boxes above reflect what the agent did and verified — not what any human did. Maintainers should treat all of it as unverified until independently checked.