Skip to content

Commit eb6ab30

Browse files
committed
feat(proxy): extend X-Headroom-Upstream override to Gemini + /v1beta normalization
Reviewing headroomlabs-ai#1089 (the launcher-side opencode wrap) surfaced two gaps in the initial override implementation: 1. Gemini's version segment is /v1beta (not /v1). The proxy's gemini handlers append /v1beta/models/... themselves, so a caller passing the versioned URL (e.g. matching headroom's _KNOWN_UPSTREAMS 'generativelanguage.googleapis.com/v1beta') would have produced a doubled /v1beta/v1beta path. The resolver now strips a trailing /v1beta as well as /v1. 2. The three /v1beta gemini routes did not thread the override through. handle_gemini_generate_content and handle_gemini_count_tokens already accepted upstream_base_url; handle_gemini_stream_generate_content did not. All three routes now pass request_upstream_override(request), and the stream handler gained the parameter. Coverage is now OpenAI (/v1/chat/completions, /v1/responses), Anthropic (/v1/messages), Gemini (/v1beta generateContent/streamGenerateContent/ countTokens), and every passthrough / catch-all route.
1 parent 5611f6a commit eb6ab30

6 files changed

Lines changed: 120 additions & 15 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010

1111
### Features
1212

13-
* **proxy:** per-request upstream override via the `X-Headroom-Upstream` header. A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream — the caller tags each request with its real upstream base and the proxy forwards there, overriding the startup default for that provider. The value is normalized like `*_TARGET_API_URL` (trailing slash and trailing `/v1` stripped) and the proxy appends the incoming request path. Honored on `/v1/chat/completions`, `/v1/responses`, `/v1/messages`, and every passthrough / catch-all route; stripped before the upstream call so it never leaks. Enables single-proxy multi-provider setups such as OpenCode's 75+ providers (each provider configured with `baseURL` = the proxy + an `X-Headroom-Upstream` header = its real upstream).
13+
* **proxy:** per-request upstream override via the `X-Headroom-Upstream` header. A single proxy instance can now fan out to many upstreams (one per provider) instead of one proxy per upstream — the caller tags each request with its real upstream base and the proxy forwards there, overriding the startup default for that provider. The value is normalized like `*_TARGET_API_URL` (trailing slash and a trailing `/v1` or `/v1beta` version segment stripped) and the proxy appends the incoming request path. Honored on `/v1/chat/completions`, `/v1/responses`, `/v1/messages`, `/v1beta/models/{model}:generateContent`/`:streamGenerateContent`/`:countTokens`, and every passthrough / catch-all route; stripped before the upstream call so it never leaks. Enables single-proxy multi-provider setups such as OpenCode's 75+ providers (each provider configured with `baseURL` = the proxy + an `X-Headroom-Upstream` header = its real upstream).
1414

1515
* **proxy:** measure and surface rolling and current token throughput metrics (active/wall-clock input, compression, effective forward, and streamed generation) in `headroom perf` CLI and the dashboard ([#959](https://github.com/chopratejas/headroom/issues/959)).
1616
* **vibe:** add Mistral Vibe CLI support with `headroom wrap vibe`.

docs/content/docs/proxy.mdx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -266,16 +266,17 @@ Rewriting the request body invalidates the caller's **SigV4** signature (it cove
266266

267267
A single Headroom proxy normally forwards to one configured upstream per provider (`OPENAI_TARGET_API_URL`, `ANTHROPIC_TARGET_API_URL`, …). The `X-Headroom-Upstream` request header overrides that upstream **per request**, so one proxy instance can fan out to many upstreams — no need to run one proxy per provider.
268268

269-
Set the header to the upstream base URL. It is normalized the same way as the `*_TARGET_API_URL` env vars (trailing slash and a trailing `/v1` segment are stripped), then the proxy appends the incoming request path:
269+
Set the header to the upstream base URL. It is normalized the same way as the `*_TARGET_API_URL` env vars (trailing slash and a trailing API-version segment — `/v1` or `/v1beta` are stripped), then the proxy appends the incoming request path:
270270

271271
| Header value | Request path | Forwarded to |
272272
| --- | --- | --- |
273273
| `https://api.deepseek.com` | `/v1/chat/completions` | `https://api.deepseek.com/v1/chat/completions` |
274274
| `https://api.deepseek.com/v1` | `/v1/chat/completions` | `https://api.deepseek.com/v1/chat/completions` |
275275
| `https://api.groq.com/openai/v1` | `/v1/chat/completions` | `https://api.groq.com/openai/v1/chat/completions` |
276276
| `https://api.anthropic.com` | `/v1/messages` | `https://api.anthropic.com/v1/messages` |
277+
| `https://generativelanguage.googleapis.com/v1beta` | `/v1beta/models/gemini-2.0-flash:generateContent` | `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent` |
277278

278-
The header is an internal `x-headroom-*` control flag: it is consumed by the proxy and stripped before the upstream call, so it never leaks to the provider. Honored on the OpenAI (`/v1/chat/completions`, `/v1/responses`), Anthropic (`/v1/messages`), and every passthrough / catch-all route.
279+
The header is an internal `x-headroom-*` control flag: it is consumed by the proxy and stripped before the upstream call, so it never leaks to the provider. Honored on the OpenAI (`/v1/chat/completions`, `/v1/responses`), Anthropic (`/v1/messages`), Gemini (`/v1beta/models/{model}:generateContent`, `:streamGenerateContent`, `:countTokens`), and every passthrough / catch-all route.
279280

280281
### Use case: many providers through one proxy
281282

headroom/providers/proxy_routes.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -576,15 +576,21 @@ async def cancel_batch(request: Request, batch_id: str):
576576

577577
@app.post("/v1beta/models/{model}:generateContent")
578578
async def gemini_generate_content(request: Request, model: str):
579-
return await proxy.handle_gemini_generate_content(request, model)
579+
return await proxy.handle_gemini_generate_content(
580+
request, model, upstream_base_url=request_upstream_override(request)
581+
)
580582

581583
@app.post("/v1beta/models/{model}:streamGenerateContent")
582584
async def gemini_stream_generate_content(request: Request, model: str):
583-
return await proxy.handle_gemini_stream_generate_content(request, model)
585+
return await proxy.handle_gemini_stream_generate_content(
586+
request, model, upstream_base_url=request_upstream_override(request)
587+
)
584588

585589
@app.post("/v1beta/models/{model}:countTokens")
586590
async def gemini_count_tokens(request: Request, model: str):
587-
return await proxy.handle_gemini_count_tokens(request, model)
591+
return await proxy.handle_gemini_count_tokens(
592+
request, model, upstream_base_url=request_upstream_override(request)
593+
)
588594

589595
@app.post("/v1internal:streamGenerateContent")
590596
async def google_cloudcode_stream_generate_content(request: Request):

headroom/proxy/handlers/gemini.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -901,11 +901,12 @@ async def handle_gemini_stream_generate_content(
901901
self,
902902
request: Request,
903903
model: str,
904+
upstream_base_url: str | None = None,
904905
) -> StreamingResponse | JSONResponse:
905906
"""Handle Gemini streaming endpoint /v1beta/models/{model}:streamGenerateContent."""
906907
from fastapi.responses import JSONResponse
907908

908-
from headroom.proxy.helpers import _read_request_json
909+
from headroom.proxy.helpers import _read_request_json, request_upstream_override
909910
from headroom.tokenizers import get_tokenizer
910911

911912
start_time = time.time()
@@ -957,9 +958,10 @@ async def handle_gemini_stream_generate_content(
957958

958959
# Build URL with SSE param
959960
query_params = dict(request.query_params)
960-
url = f"{self.GEMINI_API_URL}/v1beta/models/{model}:streamGenerateContent?alt=sse"
961+
_gemini_base = upstream_base_url or request_upstream_override(request) or self.GEMINI_API_URL
962+
url = f"{_gemini_base}/v1beta/models/{model}:streamGenerateContent?alt=sse"
961963
if "key" in query_params:
962-
url = f"{self.GEMINI_API_URL}/v1beta/models/{model}:streamGenerateContent?key={query_params['key']}&alt=sse"
964+
url = f"{_gemini_base}/v1beta/models/{model}:streamGenerateContent?key={query_params['key']}&alt=sse"
963965

964966
return await self._stream_response(
965967
url,

headroom/proxy/helpers.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1574,10 +1574,12 @@ def request_upstream_override(request: Request) -> str | None:
15741574
providers) and the proxy forwards there instead of its startup default.
15751575
15761576
The value is normalized to match the proxy's internal ``*_API_URL``
1577-
format (trailing slash and a trailing ``/v1`` segment are stripped), so
1578-
both ``https://api.deepseek.com`` and ``https://api.deepseek.com/v1``
1579-
resolve to ``https://api.deepseek.com``. The proxy then appends the
1580-
incoming request path (e.g. ``/v1/chat/completions``).
1577+
format (trailing slash and a trailing API-version segment are stripped),
1578+
so all of ``https://api.deepseek.com``, ``https://api.deepseek.com/v1``,
1579+
and ``https://generativelanguage.googleapis.com/v1beta`` resolve to their
1580+
bare host. The proxy then appends the incoming request path —
1581+
``/v1/chat/completions`` for OpenAI/Anthropic, ``/v1beta/models/...`` for
1582+
Gemini — so passing the version in the header does not double it up.
15811583
15821584
Returns ``None`` when the header is unset or empty. The header is an
15831585
``x-headroom-*`` control flag and is stripped from upstream-bound
@@ -1589,8 +1591,13 @@ def request_upstream_override(request: Request) -> str | None:
15891591
normalized = raw.strip().rstrip("/")
15901592
if not normalized:
15911593
return None
1592-
if normalized.endswith("/v1"):
1593-
normalized = normalized[:-3]
1594+
# Strip a trailing API-version segment the proxy's handlers add
1595+
# themselves (/v1 for OpenAI/Anthropic/Vertex, /v1beta for Gemini), so
1596+
# callers may pass either the bare host or the versioned URL.
1597+
for suffix in ("/v1", "/v1beta"):
1598+
if normalized.endswith(suffix):
1599+
normalized = normalized[: -len(suffix)]
1600+
break
15941601
return normalized
15951602

15961603

tests/test_proxy_upstream_override.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,28 @@ def test_override_strips_trailing_v1_with_slash() -> None:
9494
)
9595

9696

97+
def test_override_strips_trailing_v1beta() -> None:
98+
# Gemini's version segment is /v1beta (not /v1). The proxy's gemini
99+
# handlers append /v1beta/models/... themselves, so a caller passing the
100+
# versioned URL (e.g. matching headroom's _KNOWN_UPSTREAMS) must have it
101+
# stripped to avoid a doubled /v1beta/v1beta path.
102+
assert (
103+
request_upstream_override(
104+
_stub_request({"x-headroom-upstream": "https://generativelanguage.googleapis.com/v1beta"})
105+
)
106+
== "https://generativelanguage.googleapis.com"
107+
)
108+
109+
110+
def test_override_strips_trailing_v1beta_with_slash() -> None:
111+
assert (
112+
request_upstream_override(
113+
_stub_request({"x-headroom-upstream": "https://generativelanguage.googleapis.com/v1beta/"})
114+
)
115+
== "https://generativelanguage.googleapis.com"
116+
)
117+
118+
97119
def test_override_preserves_path_prefix_before_v1() -> None:
98120
# OpenRouter / Groq style: the /v1 is the API version, the prefix is real.
99121
assert (
@@ -256,6 +278,73 @@ def test_catchall_forwards_to_override_upstream() -> None:
256278
assert url == "https://api.deepseek.com/some/custom/path"
257279

258280

281+
# ── /v1beta gemini routes thread the override through upstream_base_url ─
282+
283+
284+
def test_v1beta_generate_content_threads_override(monkeypatch) -> None:
285+
captured: list[Any] = []
286+
287+
async def fake_gemini_generate(
288+
self, request, model, upstream_base_url=None, provider_name="gemini"
289+
): # type: ignore[no-untyped-def]
290+
captured.append(upstream_base_url)
291+
return JSONResponse({"upstream_base_url": upstream_base_url, "model": model})
292+
293+
monkeypatch.setattr(HeadroomProxy, "handle_gemini_generate_content", fake_gemini_generate)
294+
295+
with TestClient(_app()) as client:
296+
assert client.post(
297+
"/v1beta/models/gemini-mock:generateContent",
298+
json={"contents": [{"parts": [{"text": "hi"}]}]},
299+
headers={"X-Headroom-Upstream": "https://generativelanguage.googleapis.com/v1beta"},
300+
).json()["upstream_base_url"] == "https://generativelanguage.googleapis.com"
301+
302+
assert captured == ["https://generativelanguage.googleapis.com"]
303+
304+
305+
def test_v1beta_count_tokens_threads_override(monkeypatch) -> None:
306+
captured: list[Any] = []
307+
308+
async def fake_gemini_count(
309+
self, request, model, upstream_base_url=None, provider_name="gemini"
310+
): # type: ignore[no-untyped-def]
311+
captured.append(upstream_base_url)
312+
return JSONResponse({"upstream_base_url": upstream_base_url})
313+
314+
monkeypatch.setattr(HeadroomProxy, "handle_gemini_count_tokens", fake_gemini_count)
315+
316+
with TestClient(_app()) as client:
317+
assert client.post(
318+
"/v1beta/models/gemini-mock:countTokens",
319+
json={"contents": [{"parts": [{"text": "hi"}]}]},
320+
headers={"X-Headroom-Upstream": "https://generativelanguage.googleapis.com"},
321+
).json()["upstream_base_url"] == "https://generativelanguage.googleapis.com"
322+
323+
assert captured == ["https://generativelanguage.googleapis.com"]
324+
325+
326+
def test_v1beta_stream_generate_content_threads_override(monkeypatch) -> None:
327+
"""handle_gemini_stream_generate_content now accepts upstream_base_url."""
328+
captured: list[Any] = []
329+
330+
async def fake_gemini_stream(
331+
self, request, model, upstream_base_url=None
332+
): # type: ignore[no-untyped-def]
333+
captured.append(upstream_base_url)
334+
return JSONResponse({"upstream_base_url": upstream_base_url})
335+
336+
monkeypatch.setattr(HeadroomProxy, "handle_gemini_stream_generate_content", fake_gemini_stream)
337+
338+
with TestClient(_app()) as client:
339+
assert client.post(
340+
"/v1beta/models/gemini-mock:streamGenerateContent",
341+
json={"contents": [{"parts": [{"text": "hi"}]}]},
342+
headers={"X-Headroom-Upstream": "https://generativelanguage.googleapis.com/v1beta"},
343+
).json()["upstream_base_url"] == "https://generativelanguage.googleapis.com"
344+
345+
assert captured == ["https://generativelanguage.googleapis.com"]
346+
347+
259348
def test_override_header_stripped_before_upstream_call() -> None:
260349
"""The x-headroom-upstream control flag must not leak to the upstream."""
261350
response = httpx.Response(

0 commit comments

Comments
 (0)