Replies: 1 comment
-
Look at llama.cpp: https://github.com/ggerganov/llama.cpp/blob/65c64dc36f9bca5b3f100614cdd02bf12d6b3e49/llama.h#L510 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm using GGML in a batch LLM inference use case. I'd love to know if there is any way to utilize a KV cache, to avoid recalculating some of the calculations. I couldn't find any mention of that in the code.
Even more, my wish is to use the same KV cache between prompts, since a lot of the use cases share the same prompt pattern.
Thanks!
Ben
Beta Was this translation helpful? Give feedback.
All reactions