Skip to content

fix(server): drop stale prefix-cache entries when a snapshot slot is reused#371

Merged
davide221 merged 1 commit into
mainfrom
fix/prefix-cache-stale-slot-entries
Jun 11, 2026
Merged

fix(server): drop stale prefix-cache entries when a snapshot slot is reused#371
davide221 merged 1 commit into
mainfrom
fix/prefix-cache-stale-slot-entries

Conversation

@davide221

Copy link
Copy Markdown
Contributor

Problem

Follow-up to #370, fixing the root cause behind its symptom.

The inline and full-compress prefix caches hand out snapshot slots round-robin (next_slot_), and the counter advances in prepare_*_snap even when the snapshot later aborts (degenerate boundary < 512 tokens, failed generation, client disconnect). One burned step is enough: a later confirm wraps onto a slot that a live entry still references. From then on the entry table and the slot contents disagree — the entry's hash describes one token stream, the slot holds a snapshot of another.

A stale entry then misbehaves in two ways:

Fix

When confirm_inline_snap / confirm_full_snap commit a snapshot into a slot, erase every other entry still pointing at that slot. A slot holds exactly one snapshot, so at most one entry may describe it. 24 lines, no API change.

Validation (RTX 3090, Qwen3.6-27B Q4_K_M, --prefix-cache-slots 2)

Deterministic repro: short conv (snap@3801 → slot 0) → tiny conv whose snap aborts (burns a slot step) → big distinct conv (8.2K tokens, wraps onto slot 0) → shortened follow-up of the first conv.

  • on main pre-server: fall back to fresh prefill when a cached snapshot is longer than the prompt #370 this sequence ends in ok=false out=0 error=snapshot_longer_than_prompt;
  • with this fix the wrap logs [pc] dropping stale entry for reused slot=0, the follow-up is a clean miss with correct output, and a longer same-conversation follow-up restores a valid snapshot (correct KV, correct answer) — the corruption window is closed, cache effectiveness preserved.

Greedy outputs across the whole sequence match the no-cache baseline. All 1905 server unit assertions pass.

🧙 Built with WOZCODE

The inline and full-compress prefix caches assign snapshot slots
round-robin via next_slot_, which advances in prepare_*_snap even when
the snapshot later aborts (degenerate boundary, failed generation,
client disconnect). A burned step makes a later confirm wrap onto a
slot that a live entry still references. From then on the entry table
and the slot contents disagree: the entry's hash describes one token
stream, the slot holds a snapshot of another.

Consequences of such a stale entry:
- follow-up prompt shorter than the slot snapshot: failed request
  (snapshot_longer_than_prompt) before PR #370, conservative cache
  miss after it;
- follow-up prompt longer than the slot snapshot: the restore path
  attaches KV from the wrong token stream with no validation - silent
  context corruption.

Fix the root cause: when confirm_inline_snap / confirm_full_snap
commit a snapshot into a slot, erase every other entry still pointing
at that slot. A slot holds exactly one snapshot, so at most one entry
may describe it.

Verified on RTX 3090 (Qwen3.6-27B Q4_K_M, --prefix-cache-slots 2)
with the deterministic PR #370 repro (short conv -> aborted snap ->
big conv wrapping onto slot 0 -> shortened follow-up): the wrap now
logs '[pc] dropping stale entry for reused slot=0', the follow-up is
a clean miss with correct output, and a longer same-conversation
follow-up restores a valid snapshot. Greedy outputs across the
sequence match the no-cache baseline; 1905 server unit assertions
pass.

Co-Authored-By: WOZCODE <contact@withwoz.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@davide221 davide221 merged commit 53ca591 into main Jun 11, 2026
5 checks passed
@davide221 davide221 deleted the fix/prefix-cache-stale-slot-entries branch June 12, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant