Skip to content

Claude usage can be undercounted when the same request is logged multiple times #888

@Jaredw2289-svg

Description

@Jaredw2289-svg

Summary

ccusage can undercount Claude transcript usage when the same assistant response is written multiple times with the same message.id and requestId.

The current dedupe logic appears to keep the first seen entry for a given message.id:requestId pair and drop later entries. In recent Claude Code transcripts, the first entry is often an intermediate usage snapshot, while the last entry contains the final output_tokens for that same request.

This makes daily/session totals systematically too low, especially for output_tokens and calculated cost.

Environment

  • ccusage version: 18.0.10
  • command used: npx ccusage daily --since 20260310 --until 20260310 --json --mode calculate
  • Claude transcript source: local ~/.claude/projects/**/*.jsonl

Minimal reproduction pattern

I am seeing transcript entries like this (redacted / minimized):

{"type":"assistant","timestamp":"2026-03-11T02:40:21.204Z","requestId":"req_x","message":{"id":"msg_x","model":"claude-opus-4-6","usage":{"input_tokens":3,"cache_creation_input_tokens":2285,"cache_read_input_tokens":22137,"output_tokens":2}}}
{"type":"assistant","timestamp":"2026-03-11T02:40:22.650Z","requestId":"req_x","message":{"id":"msg_x","model":"claude-opus-4-6","usage":{"input_tokens":3,"cache_creation_input_tokens":2285,"cache_read_input_tokens":22137,"output_tokens":1093}}}

Both lines have the same message.id and requestId, but the later line is clearly the final usage snapshot.

If the first line is kept and the second line is discarded as a duplicate, the request is undercounted by 1091 output tokens.

Why I think this is happening

The installed code path in dist/data-loader-B58Zt4YE.js builds a dedupe key from:

  • message.id
  • requestId

and then skips any later entry with the same key.

In practice, this means "first seen wins".

For these Claude transcripts, a safer rule seems to be one of:

  1. keep the latest entry for the same logical request, or
  2. keep the entry with the largest total usage, or
  3. at minimum, prefer a later entry when usage differs.

What I measured locally

I wrote a small one-off parser over the raw transcript JSONL files for a single day and compared three dedupe strategies for assistant entries grouped by sessionId + requestId:

  • first seen: outputTokens = 130,785
  • latest seen: outputTokens = 648,562
  • max seen: outputTokens = 648,649

ccusage matched the first seen result, which is the smallest one.

In the same dataset:

  • duplicate request groups: 606
  • groups where output_tokens changed across duplicate entries: 551
  • among those, 550 had latest == max

So this does not look like random duplication. It looks like the transcript is appending intermediate snapshots and then a final usage snapshot for the same request.

Expected behavior

When multiple assistant entries share the same logical request identity but have different usage totals, ccusage should prefer the final or largest usage record rather than the first one.

Actual behavior

ccusage keeps the first matching entry, which can significantly undercount Claude output_tokens and cost.

Notes

This seems distinct from sub-task / sidechain aggregation. Even within a single transcript file, the duplicate-entry handling is enough to undercount usage.

If helpful, I can provide a standalone script that reproduces the discrepancy from raw transcript files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions