Description
Name and Version
llama-sever --cache-reuse 1 ...
Operating systems
No response
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Bug for cache reuse:When using the llama_kv_cache_seq_rm
function, the positions of tokens after head_c
are offset due to the kv_shift
. If head_c
is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:
-
Initial KV Cache State:
Cache Tokens: a b c d e f g h j Cell Positions: 0 1 2 3 4 5 6 7 8 New Tokens: a b e f h j 0 1 - - - -
-
First Operation:
head_p
is set to 2, andhead_c
is also set to 2.- The token 'e' is found, so
head_c
is updated to 4, andn_match
is set to 2. kv_shift
is set to -2.- Tokens from
head_p
tohead_c
(positions 2 to 4: tokens 'c', 'd') are removed.Cache Tokens: a b c d e f g h j Cell Positions: 0 1 - - 4 5 6 7 8
- The remaining tokens' positions are updated by adding
kv_shift
(-2):Cache Tokens: a b c d e f g h j Cell Positions: 0 1 - - 2 3 4 5 6
head_p
is updated tohead_p + n_match
(2 + 2 = 4).head_c
is updated tohead_c + n_match
(4 + 2 = 6).
-
Second Operation:
-
head_p
is 4, andhead_c
is 6. -
The token 'h' is found, so
head_c
is updated to 7. -
Tokens from
head_p
tohead_c
(positions 4 to 7: tokens 'g', 'h', 'j') are removed.Cache Tokens: a b c d e f g h j Cell Positions: 0 1 - - 2 3 - - -
-
After this operation, valid tokens('h', 'j') in the cache are removed because their positions have been shifted incorrectly.
-
This demonstrates how improper handling of kv_shift
and head_c
updates can lead to the unintended removal of valid tokens in the KV cache.
First Bad Commit
No response