Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: Cache position calculation error(#12160) #12161

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Clauszy
Copy link

@Clauszy Clauszy commented Mar 3, 2025

Bug for cache reuse:When using the llama_kv_cache_seq_rm function, the positions of tokens after head_c are offset due to the kv_shift. If head_c is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:

  1. Initial KV Cache State:

    Cache Tokens: a b c d e  f g h j
    Cell Positions:  0 1 2 3 4 5 6 7 8
    New Tokens: a b e f h j
                0 1 - - - -
    
  2. First Operation:

    • head_p is set to 2, and head_c is also set to 2.
    • The token 'e' is found, so head_c is updated to 4, and n_match is set to 2.
    • kv_shift is set to -2.
    • Tokens from head_p to head_c (positions 2 to 4: tokens 'c', 'd') are removed.
      Cache Tokens: a b c d  e f g h j
      Cell Positions:  0 1 -  -  4 5 6 7 8
      
    • The remaining tokens' positions are updated by adding kv_shift (-2):
      Cache Tokens: a b c d e f g h j
      Cell Positions:  0 1 -  - 2 3 4 5 6
      
    • head_p is updated to head_p + n_match (2 + 2 = 4).
    • head_c is updated to head_c + n_match (4 + 2 = 6).
  3. Second Operation:

    • head_p is 4, and head_c is 6.

    • The token 'h' is found, so head_c is updated to 7.

    • Tokens from head_p to head_c (positions 4 to 7: tokens 'g', 'h', 'j') are removed.

      Cache Tokens:   a b c d e f g h j
      Cell Positions: 0 1 - - 2 3 - - -
      
    • After this operation, valid tokens('g', 'h') in the cache are removed because their positions have been shifted incorrectly.

This demonstrates how improper handling of kv_shift and head_c updates can lead to the unintended removal of valid tokens in the KV cache.

The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
@Clauszy Clauszy requested a review from ngxson as a code owner March 3, 2025 12:57
@ggerganov
Copy link
Member

Nice catch!

After this operation, valid tokens('g', 'h') in the cache are removed because their positions have been shifted incorrectly.

This should be "... valid tokens ('h', 'j') ...", correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants