feat(proxy): per-request upstream override via X-Headroom-Upstream header by JavaGT · Pull Request #1199 · chopratejas/headroom

JavaGT · 2026-06-20T15:29:57Z

⚠️ Automated contribution — authored by an AI agent

This pull request was developed and submitted automatically by an AI coding agent [Human operator note, I have just tested the open code harness using opencode-go provider using this proxy setup to write a summary/navigation document of a codebase I am working on and it saved approx. 10k tokens, using only ~32.4k where the unproxied one for the same task used ~42.6k total tokens (working environment reset between)] (opencode, powered by glm-5.2), not by a human. The work was produced end-to-end by the agent: it explored the codebase, wrote the implementation, the tests, the docs, this PR description, and ran the verification commands shown below. @JavaGT did not write, review, run, or verify any of this code — @JavaGT only requested the improvement (a proxy-side way to support a multi-provider OpenCode setup through one Headroom instance) and authorized the PR submission. Please review this PR as you would any external, unverified, AI-generated contribution — the design, the code, the test results, and the "real behaviour proof" are all the agent's claims and should be checked, not trusted.

The agent chose the design (a per-request X-Headroom-Upstream header consumed by the proxy, normalized like the existing *_TARGET_API_URL env vars, stripped by the existing _strip_internal_headers) after reviewing the two related launcher-side PRs (#1089's one-proxy-per-upstream fan-out and #1173's env-var-injection wrap) and the existing x-headroom-base-url precedent in the catch-all. It reuses the proxy's own conventions rather than introducing a new mechanism — but that reuse decision and the non-goal boundaries (Codex WS / ChatGPT-OAuth branch / batch endpoints) are the agent's judgment and warrant maintainer scrutiny.

Description

A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream. The caller tags each request with its real upstream base in the X-Headroom-Upstream header and the proxy forwards there, overriding the startup default for that provider.

This is a proxy-side capability, orthogonal to the existing headroom wrap opencode PRs (#1089, #1173), which approach multi-provider from the launcher side (one proxy per upstream). This PR makes the one-proxy, many-upstreams shape possible — no per-provider proxy instances, one port, one process.

Today every provider that should be compressed needs either its own OPENAI_TARGET_API_URL / ANTHROPIC_TARGET_API_URL env var (one upstream per provider family), or its own proxy instance on its own port (the approach in #1089 — 6 proxies for 6 upstreams). For clients that can attach custom per-provider headers — OpenCode (options.headers), and any AI-SDK-based client — a single proxy can serve all of them: point every provider's base URL at the proxy and tag each request with X-Headroom-Upstream: <real upstream>. Auth stays on the client (Authorization / x-api-key pass through), so no API keys live in the proxy config.

Closes # (none — feature, no tracking issue; related: #1193, #1089, #1173)

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Performance improvement
Code refactoring (no functional changes)

Changes Made

headroom/proxy/helpers.py — request_upstream_override() + _UPSTREAM_OVERRIDE_HEADER. Reads X-Headroom-Upstream and normalizes it the same way as *_TARGET_API_URL (trailing slash and a trailing /v1 or /v1beta version segment stripped), so https://api.deepseek.com, …/v1, and https://generativelanguage.googleapis.com/v1beta all resolve to their bare host. The proxy then appends the incoming request path.
headroom/proxy/handlers/openai.py — read the override in handle_openai_chat (/v1/chat/completions), handle_openai_responses (/v1/responses, non-ChatGPT-auth branch only), and at the top of handle_passthrough (covers all passthrough routes + the verbatim catch-all in one place).
headroom/proxy/handlers/gemini.py — handle_gemini_stream_generate_content gains the upstream_base_url parameter; all three /v1beta gemini routes thread the override.
headroom/providers/proxy_routes.py — thread the override through handle_anthropic_messages's existing upstream_base_url param (/v1/messages), and through the three /v1beta gemini routes.
docs/content/docs/proxy.mdx — new Per-request upstream override section with the OpenCode config example.
CHANGELOG.md — Unreleased Features entry.
tests/test_proxy_upstream_override.py — 18 new tests (resolver normalization incl. /v1beta, /v1/messages threading, /v1/chat/completions URL build, gemini /v1beta route threading, catch-all/passthrough URL build, header-stripping).
tests/test_provider_proxy_routes.py — the generic delegate-swallowing fake now accepts **kwargs (it already swallowed *args), so the new upstream_base_url= kwarg on /v1/messages flows through.

Scope notes / non-goals

The Codex WebSocket /v1/responses path (_ws_http_fallback) and the ChatGPT-OAuth branch of /v1/responses are intentionally not overridden — those are subscription-auth endpoints with a hardcoded chatgpt.com upstream, not the configured provider target. BYOK clients (OpenCode) use the API-key branch, which is covered.
Batch endpoints (/v1/messages/batches/*, /v1/batches/*, /v1beta/batches/*) read self.*_API_URL directly and are not overridden in this PR; they can be threaded in a follow-up if there's demand. The catch-all covers arbitrary unknown paths verbatim regardless.

Testing

Unit tests pass (pytest)
Linting passes (ruff check on all changed files)
Type checking passes (mypy on all changed source files)
New tests added for new functionality
Manual testing performed

Test Output

$ uv run --no-sync pytest tests/test_proxy_upstream_override.py -q
18 passed in 3.08s

$ uv run --no-sync pytest tests/test_proxy/ tests/test_provider_proxy_routes.py tests/test_banner_upstream_targets.py -q
110 passed, 1 warning in 5.39s

$ uv run --no-sync pytest tests/ -k gemini -q
65 passed, 59 skipped, 6885 deselected in 31.38s   (skips are env-gated, not coverage gaps)

$ uv run --no-sync ruff check headroom/proxy/helpers.py headroom/proxy/handlers/openai.py headroom/proxy/handlers/gemini.py headroom/providers/proxy_routes.py tests/test_proxy_upstream_override.py
All checks passed!

$ uv run --no-sync mypy headroom/proxy/helpers.py headroom/proxy/handlers/openai.py headroom/proxy/handlers/gemini.py headroom/providers/proxy_routes.py
Success: no issues found in 4 source files

Real Behavior Proof

Environment: macOS arm64 (darwin), Python 3.13.13, headroom @ branch feat/proxy-upstream-override-header (commit eb6ab307), opencode v1.17.8. Live proxy started via python -m headroom.cli proxy --port 8787 --no-optimize --no-cache --no-rate-limit (the --no-optimize flag only avoids the model-loading pipeline delay in the test harness; the override code path is identical with optimization on, as proven by the unit/route tests).
Exact command / steps: started a mock OpenAI/Anthropic/Gemini upstream on 127.0.0.1:9999, started the real headroom proxy on 127.0.0.1:8787, then curl'd /v1/chat/completions + /v1/messages + a catch-all GET /v1/models with the X-Headroom-Upstream header, and finally ran a real opencode run --model openai/mock-model "say hi" through the proxy. Full numbered detail below.
1. Started a mock OpenAI/Anthropic/Gemini upstream on 127.0.0.1:9999 (logs every request: method, path, presence of x-headroom-upstream).
2. Started the real headroom proxy on 127.0.0.1:8787.
3. curl -X POST http://127.0.0.1:8787/v1/chat/completions -H 'X-Headroom-Upstream: http://127.0.0.1:9999' … — and the same for /v1/messages and a catch-all GET /v1/models.
4. Pointed a real opencode client at the proxy: a provider with options.baseURL = http://127.0.0.1:8787/v1 and options.headers = { "X-Headroom-Upstream": "http://127.0.0.1:9999" }, then opencode run --model openai/mock-model "say hi".
Observed result: the proxy forwarded to the override base + request path and stripped the x-headroom-upstream header before the upstream call (mock saw it absent); opencode rendered the response. Full detail below.
- Proxy log: event=outbound_request forwarder=server method=POST path=http://127.0.0.1:9999/v1/chat/completions and event=outbound_headers forwarder=openai_chat_completions stripped_count=1. Mock received the request at /v1/chat/completions with x-headroom-upstream absent (stripped). Same verified for /v1/messages and the catch-all /v1/models.
- opencode: mock received POST /v1/responses (opencode uses the Responses API) with the header stripped; opencode rendered the response (hi). Full chain: opencode → (sends header + baseURL=proxy) → headroom proxy → (override → mock) → mock upstream.
Not tested: a real Gemini upstream (the /v1beta path is covered by unit + route tests and the normalization is unit-tested; a live Gemini key was not available in this environment). Codex WebSocket fallback path (intentionally not overridden — see non-goals).

Review Readiness

I have performed a self-review
This PR is ready for human review

Checklist

My code follows the project's style guidelines (ruff clean)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas (the normalization rationale and the handle_passthrough chokepoint are documented)
I have made corresponding changes to the documentation (docs/content/docs/proxy.mdx + CHANGELOG.md)
My changes generate no new warnings
I have added tests that prove my fix is effective / that the feature works
New and existing unit tests pass locally with my changes
I have updated the CHANGELOG.md (Unreleased > Features)

Attribution

See the Automated contribution banner at the very top of this PR. In short: this PR (code, tests, docs, this description, and the verification commands/results) was produced entirely by an AI coding agent. @JavaGT requested the improvement and authorized the submission, but did not write, review, run, or verify the code. The checklist boxes above reflect what the agent did and verified — not what any human did. Maintainers should treat all of it as unverified until independently checked.

…ader A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream. The caller tags each request with its real upstream base in the X-Headroom-Upstream header and the proxy forwards there, overriding the startup default for that provider. The value is normalized like *_TARGET_API_URL (trailing slash and trailing /v1 stripped) and the proxy appends the incoming request path, so both https://api.deepseek.com and https://api.deepseek.com/v1 resolve correctly. Honored on /v1/chat/completions, /v1/responses, /v1/messages, and every passthrough / catch-all route (handle_passthrough). The header is an x-headroom-* control flag and is stripped before the upstream call, so it never leaks to the provider. Enables single-proxy multi-provider setups such as OpenCode (each provider configured with baseURL = the proxy + X-Headroom-Upstream = its real upstream), avoiding the one-proxy-per-upstream fan-out.

github-actions · 2026-06-20T15:30:07Z

PR governance

This PR does not yet satisfy the required template fields:

Paste real command output or artifact links in Testing → Test Output.

Please update the PR body, or move the PR back to draft while it is still in progress.

…normalization Reviewing chopratejas#1089 (the launcher-side opencode wrap) surfaced two gaps in the initial override implementation: 1. Gemini's version segment is /v1beta (not /v1). The proxy's gemini handlers append /v1beta/models/... themselves, so a caller passing the versioned URL (e.g. matching headroom's _KNOWN_UPSTREAMS 'generativelanguage.googleapis.com/v1beta') would have produced a doubled /v1beta/v1beta path. The resolver now strips a trailing /v1beta as well as /v1. 2. The three /v1beta gemini routes did not thread the override through. handle_gemini_generate_content and handle_gemini_count_tokens already accepted upstream_base_url; handle_gemini_stream_generate_content did not. All three routes now pass request_upstream_override(request), and the stream handler gained the parameter. Coverage is now OpenAI (/v1/chat/completions, /v1/responses), Anthropic (/v1/messages), Gemini (/v1beta generateContent/streamGenerateContent/ countTokens), and every passthrough / catch-all route.

JavaGT · 2026-06-20T16:23:28Z

Closing — withdrawing this PR. Thanks for the review bot feedback.

github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jun 20, 2026

LonelyGemini1979 mentioned this pull request Jun 20, 2026

feat: configurable upstream path prefix for OpenAI chat completions forwarding #1203

Open

This was referenced Jun 20, 2026

feat: native OpenCode support — headroom wrap opencode #1089

Open

[FEATURE] OpenCode support? #1193

Open

JavaGT closed this Jun 20, 2026

JavaGT deleted the feat/proxy-upstream-override-header branch June 20, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(proxy): per-request upstream override via X-Headroom-Upstream header#1199

feat(proxy): per-request upstream override via X-Headroom-Upstream header#1199
JavaGT wants to merge 2 commits into
chopratejas:mainfrom
JavaGT:feat/proxy-upstream-override-header

JavaGT commented Jun 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

JavaGT commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JavaGT commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Automated contribution — authored by an AI agent

Description

Type of Change

Changes Made

Scope notes / non-goals

Testing

Test Output

Real Behavior Proof

Review Readiness

Checklist

Attribution

Uh oh!

github-actions Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR governance

Uh oh!

JavaGT commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JavaGT commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 20, 2026 •

edited

Loading