Skip to content

feat(proxy): per-request upstream override via X-Headroom-Upstream header#1199

Closed
JavaGT wants to merge 2 commits into
chopratejas:mainfrom
JavaGT:feat/proxy-upstream-override-header
Closed

feat(proxy): per-request upstream override via X-Headroom-Upstream header#1199
JavaGT wants to merge 2 commits into
chopratejas:mainfrom
JavaGT:feat/proxy-upstream-override-header

Conversation

@JavaGT

@JavaGT JavaGT commented Jun 20, 2026

Copy link
Copy Markdown

⚠️ Automated contribution — authored by an AI agent

This pull request was developed and submitted automatically by an AI coding agent [Human operator note, I have just tested the open code harness using opencode-go provider using this proxy setup to write a summary/navigation document of a codebase I am working on and it saved approx. 10k tokens, using only ~32.4k where the unproxied one for the same task used ~42.6k total tokens (working environment reset between)] (opencode, powered by glm-5.2), not by a human. The work was produced end-to-end by the agent: it explored the codebase, wrote the implementation, the tests, the docs, this PR description, and ran the verification commands shown below. @JavaGT did not write, review, run, or verify any of this code@JavaGT only requested the improvement (a proxy-side way to support a multi-provider OpenCode setup through one Headroom instance) and authorized the PR submission. Please review this PR as you would any external, unverified, AI-generated contribution — the design, the code, the test results, and the "real behaviour proof" are all the agent's claims and should be checked, not trusted.

The agent chose the design (a per-request X-Headroom-Upstream header consumed by the proxy, normalized like the existing *_TARGET_API_URL env vars, stripped by the existing _strip_internal_headers) after reviewing the two related launcher-side PRs (#1089's one-proxy-per-upstream fan-out and #1173's env-var-injection wrap) and the existing x-headroom-base-url precedent in the catch-all. It reuses the proxy's own conventions rather than introducing a new mechanism — but that reuse decision and the non-goal boundaries (Codex WS / ChatGPT-OAuth branch / batch endpoints) are the agent's judgment and warrant maintainer scrutiny.

Description

A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream. The caller tags each request with its real upstream base in the X-Headroom-Upstream header and the proxy forwards there, overriding the startup default for that provider.

This is a proxy-side capability, orthogonal to the existing headroom wrap opencode PRs (#1089, #1173), which approach multi-provider from the launcher side (one proxy per upstream). This PR makes the one-proxy, many-upstreams shape possible — no per-provider proxy instances, one port, one process.

Today every provider that should be compressed needs either its own OPENAI_TARGET_API_URL / ANTHROPIC_TARGET_API_URL env var (one upstream per provider family), or its own proxy instance on its own port (the approach in #1089 — 6 proxies for 6 upstreams). For clients that can attach custom per-provider headers — OpenCode (options.headers), and any AI-SDK-based client — a single proxy can serve all of them: point every provider's base URL at the proxy and tag each request with X-Headroom-Upstream: <real upstream>. Auth stays on the client (Authorization / x-api-key pass through), so no API keys live in the proxy config.

Closes # (none — feature, no tracking issue; related: #1193, #1089, #1173)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • headroom/proxy/helpers.pyrequest_upstream_override() + _UPSTREAM_OVERRIDE_HEADER. Reads X-Headroom-Upstream and normalizes it the same way as *_TARGET_API_URL (trailing slash and a trailing /v1 or /v1beta version segment stripped), so https://api.deepseek.com, …/v1, and https://generativelanguage.googleapis.com/v1beta all resolve to their bare host. The proxy then appends the incoming request path.
  • headroom/proxy/handlers/openai.py — read the override in handle_openai_chat (/v1/chat/completions), handle_openai_responses (/v1/responses, non-ChatGPT-auth branch only), and at the top of handle_passthrough (covers all passthrough routes + the verbatim catch-all in one place).
  • headroom/proxy/handlers/gemini.pyhandle_gemini_stream_generate_content gains the upstream_base_url parameter; all three /v1beta gemini routes thread the override.
  • headroom/providers/proxy_routes.py — thread the override through handle_anthropic_messages's existing upstream_base_url param (/v1/messages), and through the three /v1beta gemini routes.
  • docs/content/docs/proxy.mdx — new Per-request upstream override section with the OpenCode config example.
  • CHANGELOG.md — Unreleased Features entry.
  • tests/test_proxy_upstream_override.py — 18 new tests (resolver normalization incl. /v1beta, /v1/messages threading, /v1/chat/completions URL build, gemini /v1beta route threading, catch-all/passthrough URL build, header-stripping).
  • tests/test_provider_proxy_routes.py — the generic delegate-swallowing fake now accepts **kwargs (it already swallowed *args), so the new upstream_base_url= kwarg on /v1/messages flows through.

Scope notes / non-goals

  • The Codex WebSocket /v1/responses path (_ws_http_fallback) and the ChatGPT-OAuth branch of /v1/responses are intentionally not overridden — those are subscription-auth endpoints with a hardcoded chatgpt.com upstream, not the configured provider target. BYOK clients (OpenCode) use the API-key branch, which is covered.
  • Batch endpoints (/v1/messages/batches/*, /v1/batches/*, /v1beta/batches/*) read self.*_API_URL directly and are not overridden in this PR; they can be threaded in a follow-up if there's demand. The catch-all covers arbitrary unknown paths verbatim regardless.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check on all changed files)
  • Type checking passes (mypy on all changed source files)
  • New tests added for new functionality
  • Manual testing performed

Test Output

$ uv run --no-sync pytest tests/test_proxy_upstream_override.py -q
18 passed in 3.08s

$ uv run --no-sync pytest tests/test_proxy/ tests/test_provider_proxy_routes.py tests/test_banner_upstream_targets.py -q
110 passed, 1 warning in 5.39s

$ uv run --no-sync pytest tests/ -k gemini -q
65 passed, 59 skipped, 6885 deselected in 31.38s   (skips are env-gated, not coverage gaps)

$ uv run --no-sync ruff check headroom/proxy/helpers.py headroom/proxy/handlers/openai.py headroom/proxy/handlers/gemini.py headroom/providers/proxy_routes.py tests/test_proxy_upstream_override.py
All checks passed!

$ uv run --no-sync mypy headroom/proxy/helpers.py headroom/proxy/handlers/openai.py headroom/proxy/handlers/gemini.py headroom/providers/proxy_routes.py
Success: no issues found in 4 source files

Real Behavior Proof

  • Environment: macOS arm64 (darwin), Python 3.13.13, headroom @ branch feat/proxy-upstream-override-header (commit eb6ab307), opencode v1.17.8. Live proxy started via python -m headroom.cli proxy --port 8787 --no-optimize --no-cache --no-rate-limit (the --no-optimize flag only avoids the model-loading pipeline delay in the test harness; the override code path is identical with optimization on, as proven by the unit/route tests).
  • Exact command / steps: started a mock OpenAI/Anthropic/Gemini upstream on 127.0.0.1:9999, started the real headroom proxy on 127.0.0.1:8787, then curl'd /v1/chat/completions + /v1/messages + a catch-all GET /v1/models with the X-Headroom-Upstream header, and finally ran a real opencode run --model openai/mock-model "say hi" through the proxy. Full numbered detail below.
    1. Started a mock OpenAI/Anthropic/Gemini upstream on 127.0.0.1:9999 (logs every request: method, path, presence of x-headroom-upstream).
    2. Started the real headroom proxy on 127.0.0.1:8787.
    3. curl -X POST http://127.0.0.1:8787/v1/chat/completions -H 'X-Headroom-Upstream: http://127.0.0.1:9999' … — and the same for /v1/messages and a catch-all GET /v1/models.
    4. Pointed a real opencode client at the proxy: a provider with options.baseURL = http://127.0.0.1:8787/v1 and options.headers = { "X-Headroom-Upstream": "http://127.0.0.1:9999" }, then opencode run --model openai/mock-model "say hi".
  • Observed result: the proxy forwarded to the override base + request path and stripped the x-headroom-upstream header before the upstream call (mock saw it absent); opencode rendered the response. Full detail below.
    • Proxy log: event=outbound_request forwarder=server method=POST path=http://127.0.0.1:9999/v1/chat/completions and event=outbound_headers forwarder=openai_chat_completions stripped_count=1. Mock received the request at /v1/chat/completions with x-headroom-upstream absent (stripped). Same verified for /v1/messages and the catch-all /v1/models.
    • opencode: mock received POST /v1/responses (opencode uses the Responses API) with the header stripped; opencode rendered the response (hi). Full chain: opencode → (sends header + baseURL=proxy) → headroom proxy → (override → mock) → mock upstream.
  • Not tested: a real Gemini upstream (the /v1beta path is covered by unit + route tests and the normalization is unit-tested; a live Gemini key was not available in this environment). Codex WebSocket fallback path (intentionally not overridden — see non-goals).

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines (ruff clean)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas (the normalization rationale and the handle_passthrough chokepoint are documented)
  • I have made corresponding changes to the documentation (docs/content/docs/proxy.mdx + CHANGELOG.md)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective / that the feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md (Unreleased > Features)

Attribution

See the Automated contribution banner at the very top of this PR. In short: this PR (code, tests, docs, this description, and the verification commands/results) was produced entirely by an AI coding agent. @JavaGT requested the improvement and authorized the submission, but did not write, review, run, or verify the code. The checklist boxes above reflect what the agent did and verified — not what any human did. Maintainers should treat all of it as unverified until independently checked.

…ader

A single proxy instance can now fan out to many upstreams (one per
provider) instead of one proxy per upstream. The caller tags each request
with its real upstream base in the X-Headroom-Upstream header and the proxy
forwards there, overriding the startup default for that provider.

The value is normalized like *_TARGET_API_URL (trailing slash and trailing
/v1 stripped) and the proxy appends the incoming request path, so both
https://api.deepseek.com and https://api.deepseek.com/v1 resolve correctly.

Honored on /v1/chat/completions, /v1/responses, /v1/messages, and every
passthrough / catch-all route (handle_passthrough). The header is an
x-headroom-* control flag and is stripped before the upstream call, so it
never leaks to the provider.

Enables single-proxy multi-provider setups such as OpenCode (each provider
configured with baseURL = the proxy + X-Headroom-Upstream = its real
upstream), avoiding the one-proxy-per-upstream fan-out.
@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR does not yet satisfy the required template fields:

  • Paste real command output or artifact links in TestingTest Output.

Please update the PR body, or move the PR back to draft while it is still in progress.

@github-actions github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jun 20, 2026
…normalization

Reviewing chopratejas#1089 (the launcher-side opencode wrap) surfaced two gaps in
the initial override implementation:

1. Gemini's version segment is /v1beta (not /v1). The proxy's gemini
   handlers append /v1beta/models/... themselves, so a caller passing the
   versioned URL (e.g. matching headroom's _KNOWN_UPSTREAMS
   'generativelanguage.googleapis.com/v1beta') would have produced a
   doubled /v1beta/v1beta path. The resolver now strips a trailing /v1beta
   as well as /v1.

2. The three /v1beta gemini routes did not thread the override through.
   handle_gemini_generate_content and handle_gemini_count_tokens already
   accepted upstream_base_url; handle_gemini_stream_generate_content did
   not. All three routes now pass request_upstream_override(request), and
   the stream handler gained the parameter.

Coverage is now OpenAI (/v1/chat/completions, /v1/responses), Anthropic
(/v1/messages), Gemini (/v1beta generateContent/streamGenerateContent/
countTokens), and every passthrough / catch-all route.
@github-actions github-actions Bot added status: ready for review Pull request body is complete and the author marked it ready for human review status: needs author action Pull request body or readiness checklist still needs author updates and removed status: needs author action Pull request body or readiness checklist still needs author updates status: ready for review Pull request body is complete and the author marked it ready for human review labels Jun 20, 2026
@JavaGT

JavaGT commented Jun 20, 2026

Copy link
Copy Markdown
Author

Closing — withdrawing this PR. Thanks for the review bot feedback.

@JavaGT JavaGT closed this Jun 20, 2026
@JavaGT JavaGT deleted the feat/proxy-upstream-override-header branch June 20, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: needs author action Pull request body or readiness checklist still needs author updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant