Skip to content

Enable multi-turn KV cache reuse#64

Merged
stikves merged 1 commit into
apple:mainfrom
stikves:fix/multi-turn-kv-cache-reuse
Jun 25, 2026
Merged

Enable multi-turn KV cache reuse#64
stikves merged 1 commit into
apple:mainfrom
stikves:fix/multi-turn-kv-cache-reuse

Conversation

@stikves

@stikves stikves commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Remove unconditional engine.reset() from CoreAIExecutor.respond() to let the KV cache persist across turns
  • The engine's implicit prefix caching (TokenHistory.resolve) already handles all cases: same-prefix reuse, divergence rewinding, and full re-processing
  • Saves re-prefill of shared conversation history on every turn

Local benchmarks (Qwen3 0.6B, M2 Max)

Multi-turn test with 2 prompts through LanguageModelSession:

Prompt 1 Prompt 2
Before (full reset each turn) 0.5s 3.0s
After (cache reuse) 0.5s 2.4s

~20% improvement on prompt 2, scaling with conversation length (longer shared prefix = more prefill skipped).

Pipelined and sequential engines produce identical output with cache reuse enabled.

Closes #42

Test plan

  • Full test suite passes
  • Multi-turn FM API verification — both engine variants pass, outputs match
  • LLM runner parity — determinism 3/3, quality 12/12, pipelined vs sequential match
  • Verify engine.lastPrefixHitCount > 0 on turn 2+

The engine's implicit prefix caching (TokenHistory.resolve) already
handles all cases: same prefix reuse, divergence rewinding, and full
re-processing. Removing the unconditional reset() lets the KV cache
persist across turns, saving re-prefill of the shared conversation
history (~1-3s per turn for typical conversations).
@stikves stikves force-pushed the fix/multi-turn-kv-cache-reuse branch from 156dc5d to 13f5068 Compare June 25, 2026 05:50
@stikves stikves merged commit 34f0db3 into apple:main Jun 25, 2026
5 of 6 checks passed
@stikves stikves deleted the fix/multi-turn-kv-cache-reuse branch June 25, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add checkpoint / resume support for kv-cache context

2 participants