UPSTREAM PR #19572: server: add Anthropic-compatible `cache_read_input_tokens` to usage metrics by loci-dev · Pull Request #1172 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-13T03:09:20Z

Note

Source pull request: ggml-org/llama.cpp#19572

I have read the contributing guidelines
Self-reported review complexity: Low

Summary

This PR adds the cache_read_input_tokens field to the server's usage metrics in API responses, aligning with the Anthropic API's prompt caching usage reporting.

When using llama-server as a drop-in replacement for the Anthropic API, clients expect cache_read_input_tokens in the usage object of the response. This field reports the number of input tokens that were read from the KV cache rather than being recomputed. It is useful for:

monitoring cache efficiency
estimating cost savings

Changes

Added n_cache_read_input_tokens field to server_task_result_cmpl_final and server_task_result_cmpl_partial structs
Populated cache_read_input_tokens in JSON output for both final and streaming responses
Clamped cache_read_input_tokens to zero if negative (defensive programming)
Updated unit tests to validate:
- presence of cache_read_input_tokens
- correct type (integer)
- non-negativity (>= 0)

Testing

All existing server tests still pass
Added new assertions to verify that cache_read_input_tokens:
- is present in usage object
- is an integer
- is ≥ 0

use claude code for review my PR and spot and issue

…se structure - Added `n_cache_read_input_tokens` field to `server_task_result_cmpl_final` and `server_task_result_cmpl_partial` structs - Populated `cache_read_input_tokens` in JSON output for both final and streaming responses - Ensured `cache_read_input_tokens` is non-negative by clamping to zero if negative - Updated unit tests to validate presence, type, and non-negativity of `cache_read_input_tokens` in usage metrics

…tart

loci-review · 2026-02-13T04:00:33Z

No meaningful performance changes were detected across 115429 analyzed functions in the following binaries: build.bin.libllama.so, build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-tokenize, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-gemma3-cli.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

wrapss added 2 commits February 12, 2026 18:52

fix(server): keep cache_read_input_tokens only in anthropic message_s…

db4a5a8

…tart

loci-dev temporarily deployed to PROD__AL_DEMO February 13, 2026 03:09 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 10 times, most recently from 10f8f26 to a6ecec6 Compare February 20, 2026 02:17

loci-dev force-pushed the main branch 2 times, most recently from 9ea4a65 to c001e9f Compare February 22, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

UPSTREAM PR #19572: server: add Anthropic-compatible `cache_read_input_tokens` to usage metrics#1172

UPSTREAM PR #19572: server: add Anthropic-compatible `cache_read_input_tokens` to usage metrics#1172
loci-dev wants to merge 2 commits intomainfrom
loci/pr-19572-anthropic-usage

loci-dev commented Feb 13, 2026

Uh oh!

loci-review bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

loci-dev commented Feb 13, 2026

Summary

Changes

Testing

Uh oh!

loci-review bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants