-
-
Notifications
You must be signed in to change notification settings - Fork 437
Description
Summary
When using calculate mode, ccusage prices all cache_creation_input_tokens at a single cache_creation_input_token_cost rate from LiteLLM, which corresponds to the 5-minute cache write multiplier (1.25× base input). However, Claude Code predominantly uses 1-hour caching, which Anthropic prices at 2× base input — a 60% higher rate.
Anthropic official pricing
From Anthropic's pricing page:
| Cache operation | Multiplier | Duration |
|---|---|---|
| 5-minute cache write | 1.25× base input price | Cache valid for 5 minutes |
| 1-hour cache write | 2× base input price | Cache valid for 1 hour |
| Cache read (hit) | 0.1× base input price | Same duration as the preceding write |
Full model pricing table (relevant columns):
| Model | Base Input | 5m Cache Writes | 1h Cache Writes | Cache Hits |
|---|---|---|---|---|
| Claude Opus 4.6 | $5/MTok | $6.25/MTok | $10/MTok | $0.50/MTok |
| Claude Sonnet 4.6 | $3/MTok | $3.75/MTok | $6/MTok | $0.30/MTok |
| Claude Sonnet 4.5 | $3/MTok | $3.75/MTok | $6/MTok | $0.30/MTok |
| Claude Haiku 4.5 | $1/MTok | $1.25/MTok | $2/MTok | $0.10/MTok |
Data available in JSONL
Claude Code's JSONL files already include a cache_creation breakdown inside the usage object that distinguishes the two durations:
{
"usage": {
"input_tokens": 3,
"output_tokens": 10,
"cache_creation_input_tokens": 23566,
"cache_read_input_tokens": 19357,
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 23566
}
}
}In my dataset (~40k usage records), the vast majority of cache creation tokens are 1-hour:
| Model | 5m tokens | 1h tokens | % 1h |
|---|---|---|---|
| claude-opus-4-6 | 9.6M | 116.5M | 92% |
| claude-sonnet-4-5-20250929 | 0 | 7.8M | 100% |
| claude-sonnet-4-6 | 2.1M | 0 | 0% |
| claude-haiku-4-5-20251001 | 15.8M | 30.9M | 66% |
Current behavior in ccusage
In packages/internal/src/pricing.ts, calculateCostFromPricing uses a single cache_creation_input_token_cost (sourced from LiteLLM) for all cache creation tokens:
const cacheCreationCost = calculateTieredCost(
tokens.cache_creation_input_tokens,
pricing.cache_creation_input_token_cost, // 1.25x rate
pricing.cache_creation_input_token_cost_above_200k_tokens,
);There is no reference to ephemeral_5m_input_tokens or ephemeral_1h_input_tokens anywhere in the codebase (gh search code "ephemeral" --repo ryoppippi/ccusage returns 0 results).
Impact
Cost comparison on my usage data:
| Model | Cost (all at 1.25×) | Cost (5m/1h split) | Under-reported |
|---|---|---|---|
| claude-opus-4-6 | $2,388 | $2,826 | $438 (18%) |
| claude-haiku-4-5-20251001 | $112 | $135 | $23 (21%) |
| claude-sonnet-4-5-20250929 | $46 | $63 | $17 (38%) |
| Total | $2,568 | $3,047 | $479 (19%) |
Suggested fix
When parsing JSONL records, check if usage.cache_creation is a dict containing ephemeral_5m_input_tokens and ephemeral_1h_input_tokens. If so, apply the correct rate for each:
ephemeral_5m_input_tokens× 1.25× base input priceephemeral_1h_input_tokens× 2× base input price
Fall back to the current single-rate behavior when the breakdown is not available.
This is partly an upstream issue in LiteLLM's pricing data (cache_creation_input_token_cost only has one rate), but ccusage can handle the split independently since the token breakdown is already in the JSONL data.