Server: Cache position calculation error(#12160) #12161
Open
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug for cache reuse:When using the
llama_kv_cache_seq_rm
function, the positions of tokens afterhead_c
are offset due to thekv_shift
. Ifhead_c
is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:Initial KV Cache State:
First Operation:
head_p
is set to 2, andhead_c
is also set to 2.head_c
is updated to 4, andn_match
is set to 2.kv_shift
is set to -2.head_p
tohead_c
(positions 2 to 4: tokens 'c', 'd') are removed.kv_shift
(-2):head_p
is updated tohead_p + n_match
(2 + 2 = 4).head_c
is updated tohead_c + n_match
(4 + 2 = 6).Second Operation:
head_p
is 4, andhead_c
is 6.The token 'h' is found, so
head_c
is updated to 7.Tokens from
head_p
tohead_c
(positions 4 to 7: tokens 'g', 'h', 'j') are removed.After this operation, valid tokens('g', 'h') in the cache are removed because their positions have been shifted incorrectly.
This demonstrates how improper handling of
kv_shift
andhead_c
updates can lead to the unintended removal of valid tokens in the KV cache.