Skip to content

Investigate prompt-cache hit-rate regression (DeepSeek cost up): per-turn <turn_meta> block may be busting the cacheable prefix #3738

Description

@Hmbown

Summary

A user reports CodeWhale is costing them more than before — strongly suggesting a prompt-cache hit-rate regression. With DeepSeek context caching, a stable prompt prefix should yield high cache-hit rates (cached input tokens are ~10x cheaper). If recent changes perturb the cacheable prefix, cost rises even with identical usage.

Why this likely regressed

DeepSeek (and most providers) cache on the longest common prefix of the request. Anything dynamic inserted early in the message order invalidates the cache for everything after it.

Prime suspects (recent per-turn additions):

  1. <turn_meta> block (fix(tui): surface mode policy in turn metadata #3623, commit 1494481). Active mode + runtime mode policy is now injected into every user turn. If this block (or anything near it) varies per turn — timestamps, changing mode label, counters, dynamic policy text — it shifts the prefix and busts the cache. Check exact placement and whether any field is non-constant turn-to-turn.
  2. System-prompt / tool-definition ordering. If tool schemas, model catalog data, or skills/MCP descriptors are injected in a non-deterministic order or with dynamic content, the cacheable system prefix breaks.
  3. Mode policy text reordering from the recent mode-sync work (fix(tui): keep mode policy in sync with engine #3722) — verify the per-turn policy snippet is stable when the mode hasn't changed.

Investigation steps (distributed prompt)

  • Use the existing /cache debug surface (per-turn cache-telemetry ring, see crates/tui/src/tui/app.rs ~1019) to measure cached vs. uncached input tokens across turns on DeepSeek.
  • Diff the actual request bodies of two consecutive turns (same mode, no file changes) and confirm the prefix is byte-identical up to the new user message. Any diff before the last user message = cache breaker.
  • Bisect: compare cache-hit rate on a build before fix(tui): surface mode policy in turn metadata #3623 vs. after, then before/after fix(tui): keep mode policy in sync with engine #3722.
  • Confirm cache_hit_tokens / prompt_cache_hit_tokens are being read from the DeepSeek usage payload and surfaced in cost math (verify the cost calc applies the cached-token discount).

Acceptance

  • Two consecutive same-mode turns show a high cached-prefix ratio on DeepSeek.
  • Cost-per-turn with caching returns to pre-regression levels.
  • Add a regression/telemetry assertion that the cacheable prefix is stable across turns when mode/context are unchanged.

Refs #3623, #3722.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions