You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A user reports CodeWhale is costing them more than before — strongly suggesting a prompt-cache hit-rate regression. With DeepSeek context caching, a stable prompt prefix should yield high cache-hit rates (cached input tokens are ~10x cheaper). If recent changes perturb the cacheable prefix, cost rises even with identical usage.
Why this likely regressed
DeepSeek (and most providers) cache on the longest common prefix of the request. Anything dynamic inserted early in the message order invalidates the cache for everything after it.
Prime suspects (recent per-turn additions):
<turn_meta> block (fix(tui): surface mode policy in turn metadata #3623, commit 1494481). Active mode + runtime mode policy is now injected into every user turn. If this block (or anything near it) varies per turn — timestamps, changing mode label, counters, dynamic policy text — it shifts the prefix and busts the cache. Check exact placement and whether any field is non-constant turn-to-turn.
System-prompt / tool-definition ordering. If tool schemas, model catalog data, or skills/MCP descriptors are injected in a non-deterministic order or with dynamic content, the cacheable system prefix breaks.
Use the existing /cache debug surface (per-turn cache-telemetry ring, see crates/tui/src/tui/app.rs ~1019) to measure cached vs. uncached input tokens across turns on DeepSeek.
Diff the actual request bodies of two consecutive turns (same mode, no file changes) and confirm the prefix is byte-identical up to the new user message. Any diff before the last user message = cache breaker.
Confirm cache_hit_tokens / prompt_cache_hit_tokens are being read from the DeepSeek usage payload and surfaced in cost math (verify the cost calc applies the cached-token discount).
Acceptance
Two consecutive same-mode turns show a high cached-prefix ratio on DeepSeek.
Cost-per-turn with caching returns to pre-regression levels.
Add a regression/telemetry assertion that the cacheable prefix is stable across turns when mode/context are unchanged.
Summary
A user reports CodeWhale is costing them more than before — strongly suggesting a prompt-cache hit-rate regression. With DeepSeek context caching, a stable prompt prefix should yield high cache-hit rates (cached input tokens are ~10x cheaper). If recent changes perturb the cacheable prefix, cost rises even with identical usage.
Why this likely regressed
DeepSeek (and most providers) cache on the longest common prefix of the request. Anything dynamic inserted early in the message order invalidates the cache for everything after it.
Prime suspects (recent per-turn additions):
<turn_meta>block (fix(tui): surface mode policy in turn metadata #3623, commit 1494481). Active mode + runtime mode policy is now injected into every user turn. If this block (or anything near it) varies per turn — timestamps, changing mode label, counters, dynamic policy text — it shifts the prefix and busts the cache. Check exact placement and whether any field is non-constant turn-to-turn.Investigation steps (distributed prompt)
/cachedebug surface (per-turn cache-telemetry ring, seecrates/tui/src/tui/app.rs~1019) to measure cached vs. uncached input tokens across turns on DeepSeek.cache_hit_tokens/prompt_cache_hit_tokensare being read from the DeepSeek usage payload and surfaced in cost math (verify the cost calc applies the cached-token discount).Acceptance
Refs #3623, #3722.