You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Bug for cache reuse:When using the llama_kv_cache_seq_rm function, the positions of tokens after head_c are offset due to the kv_shift. If head_c is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:
Initial KV Cache State:
Cache Tokens: a b c d e f g h j
Cell Positions: 0 1 2 3 4 5 6 7 8
New Tokens: a b e f h j
0 1 - - - -
First Operation:
head_p is set to 2, and head_c is also set to 2.
The token 'e' is found, so head_c is updated to 4, and n_match is set to 2.
kv_shift is set to -2.
Tokens from head_p to head_c (positions 2 to 4: tokens 'c', 'd') are removed.
Cache Tokens: a b c d e f g h j
Cell Positions: 0 1 - - 4 5 6 7 8
The remaining tokens' positions are updated by adding kv_shift (-2):
Cache Tokens: a b c d e f g h j
Cell Positions: 0 1 - - 2 3 4 5 6
head_p is updated to head_p + n_match (2 + 2 = 4).
head_c is updated to head_c + n_match (4 + 2 = 6).
Second Operation:
head_p is 4, and head_c is 6.
The token 'h' is found, so head_c is updated to 7.
Tokens from head_p to head_c (positions 4 to 7: tokens 'g', 'h', 'j') are removed.
Cache Tokens: a b c d e f g h j
Cell Positions: 0 1 - - 2 3 - - -
After this operation, valid tokens('g', 'h') in the cache are removed because their positions have been shifted incorrectly.
This demonstrates how improper handling of kv_shift and head_c updates can lead to the unintended removal of valid tokens in the KV cache.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered:
The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
Name and Version
llama-sever --cache-reuse 1 ...
Operating systems
No response
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Bug for cache reuse:When using the
llama_kv_cache_seq_rm
function, the positions of tokens afterhead_c
are offset due to thekv_shift
. Ifhead_c
is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:Initial KV Cache State:
First Operation:
head_p
is set to 2, andhead_c
is also set to 2.head_c
is updated to 4, andn_match
is set to 2.kv_shift
is set to -2.head_p
tohead_c
(positions 2 to 4: tokens 'c', 'd') are removed.kv_shift
(-2):head_p
is updated tohead_p + n_match
(2 + 2 = 4).head_c
is updated tohead_c + n_match
(4 + 2 = 6).Second Operation:
head_p
is 4, andhead_c
is 6.The token 'h' is found, so
head_c
is updated to 7.Tokens from
head_p
tohead_c
(positions 4 to 7: tokens 'g', 'h', 'j') are removed.After this operation, valid tokens('g', 'h') in the cache are removed because their positions have been shifted incorrectly.
This demonstrates how improper handling of
kv_shift
andhead_c
updates can lead to the unintended removal of valid tokens in the KV cache.First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: