Measuring Token Cache Hits in vLLM #14546

LeonardTwcs · 2025-03-10T08:58:58Z

LeonardTwcs
Mar 10, 2025

Hi everyone,

I'm currently using vLLM for inference and I'm interested in monitoring the efficiency of the KV cache. Specifically, is there a built-in metric or recommended method to measure how many tokens are being served from the cache (i.e., cache hits) compared to those that are recomputed?

If there isn't a direct metric for this, are there any suggested workarounds or best practices for implementing custom metrics to track token cache hits?

Thanks in advance for your help!

thiagolaitz · 2025-03-10T13:53:00Z

thiagolaitz
Mar 10, 2025

I'd like to know if it's possible to return cached tokens in the prompt_tokens_details parameter

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measuring Token Cache Hits in vLLM #14546

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Measuring Token Cache Hits in vLLM #14546

LeonardTwcs Mar 10, 2025

Replies: 1 comment

thiagolaitz Mar 10, 2025

LeonardTwcs
Mar 10, 2025

thiagolaitz
Mar 10, 2025