Skip to content

Misc. bug: Calculating the position of kv cache error in llama sever #12160

Closed
@Clauszy

Description

@Clauszy

Name and Version

llama-sever --cache-reuse 1 ...

Operating systems

No response

Which llama.cpp modules do you know to be affected?

llama-server

Command line

Problem description & steps to reproduce

Bug for cache reuse:When using the llama_kv_cache_seq_rm function, the positions of tokens after head_c are offset due to the kv_shift. If head_c is updated incorrectly or not properly adjusted after the shift, it may cause valid tokens to be removed in subsequent operations. Here's a clear explanation of the process:

  1. Initial KV Cache State:

    Cache Tokens: a b c d e  f g h j
    Cell Positions:  0 1 2 3 4 5 6 7 8
    New Tokens: a b e f h j
                0 1 - - - -
    
  2. First Operation:

    • head_p is set to 2, and head_c is also set to 2.
    • The token 'e' is found, so head_c is updated to 4, and n_match is set to 2.
    • kv_shift is set to -2.
    • Tokens from head_p to head_c (positions 2 to 4: tokens 'c', 'd') are removed.
      Cache Tokens: a b c d  e f g h j
      Cell Positions:  0 1 -  -  4 5 6 7 8
      
    • The remaining tokens' positions are updated by adding kv_shift (-2):
      Cache Tokens: a b c d e f g h j
      Cell Positions:  0 1 -  - 2 3 4 5 6
      
    • head_p is updated to head_p + n_match (2 + 2 = 4).
    • head_c is updated to head_c + n_match (4 + 2 = 6).
  3. Second Operation:

    • head_p is 4, and head_c is 6.

    • The token 'h' is found, so head_c is updated to 7.

    • Tokens from head_p to head_c (positions 4 to 7: tokens 'g', 'h', 'j') are removed.

      Cache Tokens:   a b c d e f g h j
      Cell Positions: 0 1 - - 2 3 - - -
      
    • After this operation, valid tokens('h', 'j') in the cache are removed because their positions have been shifted incorrectly.

This demonstrates how improper handling of kv_shift and head_c updates can lead to the unintended removal of valid tokens in the KV cache.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions