Enable multi-turn KV cache reuse by stikves · Pull Request #64 · apple/coreai-models

stikves · 2026-06-25T05:42:34Z

Summary

Remove unconditional engine.reset() from CoreAIExecutor.respond() to let the KV cache persist across turns
The engine's implicit prefix caching (TokenHistory.resolve) already handles all cases: same-prefix reuse, divergence rewinding, and full re-processing
Saves re-prefill of shared conversation history on every turn

Local benchmarks (Qwen3 0.6B, M2 Max)

Multi-turn test with 2 prompts through LanguageModelSession:

	Prompt 1	Prompt 2
Before (full reset each turn)	0.5s	3.0s
After (cache reuse)	0.5s	2.4s

~20% improvement on prompt 2, scaling with conversation length (longer shared prefix = more prefill skipped).

Pipelined and sequential engines produce identical output with cache reuse enabled.

Closes #42

Test plan

Full test suite passes
Multi-turn FM API verification — both engine variants pass, outputs match
LLM runner parity — determinism 3/3, quality 12/12, pipelined vs sequential match
Verify engine.lastPrefixHitCount > 0 on turn 2+

The engine's implicit prefix caching (TokenHistory.resolve) already handles all cases: same prefix reuse, divergence rewinding, and full re-processing. Removing the unconditional reset() lets the KV cache persist across turns, saving re-prefill of the shared conversation history (~1-3s per turn for typical conversations).

stikves force-pushed the fix/multi-turn-kv-cache-reuse branch from 156dc5d to 13f5068 Compare June 25, 2026 05:50

stikves requested review from carinapeng, kevchengcodes and tjia1818 June 25, 2026 05:53

alejandro-isaza approved these changes Jun 25, 2026

View reviewed changes

stikves merged commit 34f0db3 into apple:main Jun 25, 2026
5 of 6 checks passed

stikves deleted the fix/multi-turn-kv-cache-reuse branch June 25, 2026 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable multi-turn KV cache reuse#64

Enable multi-turn KV cache reuse#64
stikves merged 1 commit into
apple:mainfrom
stikves:fix/multi-turn-kv-cache-reuse

stikves commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stikves commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Local benchmarks (Qwen3 0.6B, M2 Max)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stikves commented Jun 25, 2026 •

edited

Loading