Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: Calculating the position of kv cache error in llama sever #12160

Open
Clauszy opened this issue Mar 3, 2025 · 0 comments
Open

Misc. bug: Calculating the position of kv cache error in llama sever #12160

Clauszy opened this issue Mar 3, 2025 · 0 comments

Comments

@Clauszy
Copy link

Clauszy commented Mar 3, 2025

Name and Version

llama-sever --cache-reuse 1 ...

Operating systems

No response

Which llama.cpp modules do you know to be affected?

llama-server

Command line

Problem description & steps to reproduce

Bug for cache reuse:When using the llama_kv_cache_seq_rm function, the positions of tokens after head_c are offset due to the kv_shift. If head_c is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:

  1. Initial KV Cache State:

    Cache Tokens: a b c d e  f g h j
    Cell Positions:  0 1 2 3 4 5 6 7 8
    New Tokens: a b e f h j
                0 1 - - - -
    
  2. First Operation:

    • head_p is set to 2, and head_c is also set to 2.
    • The token 'e' is found, so head_c is updated to 4, and n_match is set to 2.
    • kv_shift is set to -2.
    • Tokens from head_p to head_c (positions 2 to 4: tokens 'c', 'd') are removed.
      Cache Tokens: a b c d  e f g h j
      Cell Positions:  0 1 -  -  4 5 6 7 8
      
    • The remaining tokens' positions are updated by adding kv_shift (-2):
      Cache Tokens: a b c d e f g h j
      Cell Positions:  0 1 -  - 2 3 4 5 6
      
    • head_p is updated to head_p + n_match (2 + 2 = 4).
    • head_c is updated to head_c + n_match (4 + 2 = 6).
  3. Second Operation:

    • head_p is 4, and head_c is 6.

    • The token 'h' is found, so head_c is updated to 7.

    • Tokens from head_p to head_c (positions 4 to 7: tokens 'g', 'h', 'j') are removed.

      Cache Tokens:   a b c d e f g h j
      Cell Positions: 0 1 - - 2 3 - - -
      
    • After this operation, valid tokens('g', 'h') in the cache are removed because their positions have been shifted incorrectly.

This demonstrates how improper handling of kv_shift and head_c updates can lead to the unintended removal of valid tokens in the KV cache.

First Bad Commit

No response

Relevant log output

Clauszy added a commit to Clauszy/llama.cpp that referenced this issue Mar 3, 2025
The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant