Skip to content

Cache creation cost underestimated: 1-hour cache writes priced at 5m rate (1.25x) instead of 2x #899

@lumian2015

Description

@lumian2015

Summary

When using calculate mode, ccusage prices all cache_creation_input_tokens at a single cache_creation_input_token_cost rate from LiteLLM, which corresponds to the 5-minute cache write multiplier (1.25× base input). However, Claude Code predominantly uses 1-hour caching, which Anthropic prices at 2× base input — a 60% higher rate.

Anthropic official pricing

From Anthropic's pricing page:

Cache operation Multiplier Duration
5-minute cache write 1.25× base input price Cache valid for 5 minutes
1-hour cache write 2× base input price Cache valid for 1 hour
Cache read (hit) 0.1× base input price Same duration as the preceding write

Full model pricing table (relevant columns):

Model Base Input 5m Cache Writes 1h Cache Writes Cache Hits
Claude Opus 4.6 $5/MTok $6.25/MTok $10/MTok $0.50/MTok
Claude Sonnet 4.6 $3/MTok $3.75/MTok $6/MTok $0.30/MTok
Claude Sonnet 4.5 $3/MTok $3.75/MTok $6/MTok $0.30/MTok
Claude Haiku 4.5 $1/MTok $1.25/MTok $2/MTok $0.10/MTok

Data available in JSONL

Claude Code's JSONL files already include a cache_creation breakdown inside the usage object that distinguishes the two durations:

{
  "usage": {
    "input_tokens": 3,
    "output_tokens": 10,
    "cache_creation_input_tokens": 23566,
    "cache_read_input_tokens": 19357,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 23566
    }
  }
}

In my dataset (~40k usage records), the vast majority of cache creation tokens are 1-hour:

Model 5m tokens 1h tokens % 1h
claude-opus-4-6 9.6M 116.5M 92%
claude-sonnet-4-5-20250929 0 7.8M 100%
claude-sonnet-4-6 2.1M 0 0%
claude-haiku-4-5-20251001 15.8M 30.9M 66%

Current behavior in ccusage

In packages/internal/src/pricing.ts, calculateCostFromPricing uses a single cache_creation_input_token_cost (sourced from LiteLLM) for all cache creation tokens:

const cacheCreationCost = calculateTieredCost(
    tokens.cache_creation_input_tokens,
    pricing.cache_creation_input_token_cost,            // 1.25x rate
    pricing.cache_creation_input_token_cost_above_200k_tokens,
);

There is no reference to ephemeral_5m_input_tokens or ephemeral_1h_input_tokens anywhere in the codebase (gh search code "ephemeral" --repo ryoppippi/ccusage returns 0 results).

Impact

Cost comparison on my usage data:

Model Cost (all at 1.25×) Cost (5m/1h split) Under-reported
claude-opus-4-6 $2,388 $2,826 $438 (18%)
claude-haiku-4-5-20251001 $112 $135 $23 (21%)
claude-sonnet-4-5-20250929 $46 $63 $17 (38%)
Total $2,568 $3,047 $479 (19%)

Suggested fix

When parsing JSONL records, check if usage.cache_creation is a dict containing ephemeral_5m_input_tokens and ephemeral_1h_input_tokens. If so, apply the correct rate for each:

  • ephemeral_5m_input_tokens × 1.25× base input price
  • ephemeral_1h_input_tokens × 2× base input price

Fall back to the current single-rate behavior when the breakdown is not available.

This is partly an upstream issue in LiteLLM's pricing data (cache_creation_input_token_cost only has one rate), but ccusage can handle the split independently since the token breakdown is already in the JSONL data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions