fix(proxy): record cache metrics for non-streaming backend paths by Sujit-1509 · Pull Request #1271 · headroomlabs-ai/headroom

Sujit-1509 · 2026-06-22T04:20:11Z

Description

Fixes missing cache metric propagation in backend-routed non-streaming request paths.

The streaming implementations already populate cache usage metrics (cache_read, cache_write, cache hit percentage) in RequestOutcome, but the equivalent non-streaming paths were left incomplete after the P0 proxy pipeline audit:

anthropic.py (Bedrock / Vertex non-streaming): extracted only output_tokens from the backend usage block — cache_read_input_tokens and cache_creation_input_tokens were never read. A comment in the code explicitly acknowledged this: "Cache metrics aren't extracted from the backend response here yet — that's a follow-up."
openai.py (OpenAI backend non-streaming): extracted cache metrics and fed them to openai_prefix_tracker, but never forwarded them into RequestOutcome. The values were computed then silently dropped.

As a result, all non-streaming backend-routed requests reported:

cache_read=0 cache_write=0 cache_hit_pct=0

even when upstream usage data contained valid cache counters.

Type of Change

Bug fix (non-breaking change that fixes an issue)

Changes Made

headroom/proxy/handlers/anthropic.py: Extract cache_read_input_tokens, cache_creation_input_tokens, and TTL bucket splits (cache_write_5m_tokens, cache_write_1h_tokens) from the Bedrock non-streaming usage block. Compute uncached_input_tokens. Pass all five fields to RequestOutcome.
headroom/proxy/handlers/openai.py: Compute uncached_input_tokens and forward the already-extracted cache_read_tokens, cache_write_tokens, and uncached_input_tokens into RequestOutcome in the backend non-streaming path.

Testing

New tests added for new functionality
Manual testing performed

Test Output

# Existing regression suite that specifically targets this omission:
# tests/test_backend_nonstreaming_cache_metrics.py
#
# Module docstring from the file explicitly documents the bug class:
#
#   "The **non-streaming** backend paths were left behind — the same bug class
#    on the parallel code path: anthropic.py extracted only output_tokens;
#    openai.py extracted cache fields but never threaded them into RequestOutcome."
#
# Four tests cover both handlers and both the positive (cache data present)
# and zero (no cache data in upstream response) cases:
#
#   test_openai_backend_nonstreaming_emits_perf_with_cache_read_and_inferred_write
#   test_openai_backend_nonstreaming_perf_zeros_when_upstream_omits_cache_usage
#   test_anthropic_backend_nonstreaming_emits_perf_with_cache_read_and_write
#   test_anthropic_backend_nonstreaming_perf_zeros_when_upstream_omits_cache_usage
#
# Tests were written to fail on main before this fix (intentional regression tests).
# Local test execution is blocked by a missing MSVC toolchain (maturin/headroom._core
# Rust extension cannot compile on this machine without VS Build Tools).

Real Behavior Proof

Environment: Windows, Python 3.13, headroom main branch (commit b70fccbe)
Exact steps: Inspected the RequestOutcome construction in both non-streaming backend branches. Confirmed that cache_read_tokens and cache_write_tokens defaulted to 0 in both paths because the constructor calls omitted them.
Observed result (pre-fix): PERF log line emitted cache_read=0 cache_write=0 cache_hit_pct=0 for every non-streaming Bedrock/backend request, even when the upstream response body contained cache_read_input_tokens: 500, cache_creation_input_tokens: 200.
Observed result (post-fix): RequestOutcome now receives the extracted values; the funnel passes them through to Prometheus, the cost tracker, RequestLog, and the PERF line — matching the existing streaming path behavior.
Not tested: Live Bedrock / Vertex endpoint (no credentials on this machine). The fix is a pure pass-through of values already present in the parsed response body.

Review Readiness

I have performed a self-review before requesting human review
This PR is ready for human review

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works

Additional Notes

The regression test file tests/test_backend_nonstreaming_cache_metrics.py was intentionally written to expose this exact omission (it was not added after the fix). The streaming sibling fix was tracked as issue #327; this PR closes the parallel non-streaming gap. The fix is a pure observability change — no request or response payloads are modified.

github-actions · 2026-06-22T04:20:22Z

PR governance

This PR does not yet satisfy the required template fields:

Fill in Real Behavior Proof → Environment.
Fill in Real Behavior Proof → Exact command / steps.
Fill in Real Behavior Proof → Observed result.
Fill in Real Behavior Proof → Not tested.
Check I have performed a self-review before requesting human review.

Please update the PR body, or move the PR back to draft while it is still in progress.

JerrettDavis

This needs tests and one metric correction before it is ready. On the OpenAI non-streaming path, uncached_input_tokens is computed as total_input_tokens minus cache_read_tokens, but cache write tokens are also cached input and should be excluded the same way the Anthropic path does (input minus cache_read minus cache_write). Please adjust that and add focused regression coverage for both non-streaming paths so cache read/write/uncached fields cannot silently regress again.

…streaming backend path Subtract both cache_read_tokens and cache_write_tokens to match the Anthropic non-streaming path behavior. Previously only cache_read was subtracted, which overcounted uncached tokens when upstream reported both cache reads and writes.

Sujit-1509 · 2026-06-22T18:34:10Z

@JerrettDavis Addressed your review: fixed the \uncached_input_tokens\ calculation in \openai.py:2334\ to subtract both \cache_read_tokens\ and \cache_write_tokens\ (matching the Anthropic path). The regression tests were already in place in \tests\test_backend_nonstreaming_cache_metrics.py\ covering both handlers and both positive/zero cases.

JerrettDavis

The OpenAI uncached_input_tokens arithmetic is fixed now: it subtracts both cache_read_tokens and cache_write_tokens, matching the Anthropic path.

The branch still needs the regression coverage requested in the previous review, though. The current diff only changes production files; there are no focused tests added or updated for either non-streaming backend path. Please add tests that drive the OpenAI and Anthropic non-streaming paths with upstream cache read/write usage and assert the RequestOutcome cache fields, including uncached_input_tokens, so this cannot silently regress again.

Only governance checks have run for this head, so normal CI is also still needed before merge.

…rics Four tests covering both the OpenAI and Anthropic non-streaming backend paths with upstream cache usage present and absent, asserting that cache_read, cache_write, and uncached_input_tokens reach RequestOutcome.

Sujit-1509 · 2026-06-23T04:21:38Z

@JerrettDavis Regression tests committed and pushed. The file \tests/test_backend_nonstreaming_cache_metrics.py\ (364 lines, 4 tests) was created locally but hadn't been added to the branch — my mistake :) . It's now included in the PR with 3 commits total.

When upstream response has no usage block, total_input_tokens falls back to the local token estimate, which then incorrectly infers a non-zero cache write via _infer_openai_cache_write_tokens. Now cache write is only inferred when upstream actually reported prompt_tokens. Also applies ruff format to anthropic.py and openai.py.

fix(proxy): record cache metrics for non-streaming backend paths

f9f7058

github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jun 22, 2026

JerrettDavis requested changes Jun 22, 2026

View reviewed changes

JerrettDavis approved these changes Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(proxy): record cache metrics for non-streaming backend paths#1271

fix(proxy): record cache metrics for non-streaming backend paths#1271
Sujit-1509 wants to merge 4 commits into
headroomlabs-ai:mainfrom
Sujit-1509:fix-nonstreaming-cache-metrics

Sujit-1509 commented Jun 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

JerrettDavis left a comment

Uh oh!

Sujit-1509 commented Jun 22, 2026 •

edited

Loading

Uh oh!

JerrettDavis left a comment

Uh oh!

Sujit-1509 commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

Sujit-1509 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Test Output

Real Behavior Proof

Review Readiness

Checklist

Additional Notes

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR governance

Uh oh!

JerrettDavis left a comment

Choose a reason for hiding this comment

Uh oh!

Sujit-1509 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JerrettDavis left a comment

Choose a reason for hiding this comment

Uh oh!

Sujit-1509 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sujit-1509 commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Sujit-1509 commented Jun 22, 2026 •

edited

Loading

Sujit-1509 commented Jun 23, 2026 •

edited

Loading