Measuring Token Cache Hits in vLLM #14546
Unanswered
LeonardTwcs
asked this question in
Q&A
Replies: 1 comment
-
I'd like to know if it's possible to return cached tokens in the prompt_tokens_details parameter |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I'm currently using vLLM for inference and I'm interested in monitoring the efficiency of the KV cache. Specifically, is there a built-in metric or recommended method to measure how many tokens are being served from the cache (i.e., cache hits) compared to those that are recomputed?
If there isn't a direct metric for this, are there any suggested workarounds or best practices for implementing custom metrics to track token cache hits?
Thanks in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions