Skip to content

Output tokens undercounted due to first-write-wins dedup of streaming chunks #901

@vibe2viable

Description

@vibe2viable

Bug Description

ccusage systematically undercounts output_tokens because the deduplication logic in data-loader.ts keeps the first record for a given messageId:requestId pair and discards subsequent ones. However, Claude Code writes multiple JSONL lines per API response as the response streams in, with incrementally increasing output_tokens. The first record has a partial count; the last record has the correct final count.

Reproduction

Look at any streaming response in a Claude Code JSONL file. You'll see multiple lines with the same message.id and requestId:

Record 1: message.id=msg_ABC, requestId=req_123, output_tokens=9,   stop_reason=null      (partial)
Record 2: message.id=msg_ABC, requestId=req_123, output_tokens=9,   stop_reason=null      (partial)
Record 3: message.id=msg_ABC, requestId=req_123, output_tokens=159, stop_reason=tool_use  (final)

The current isDuplicateEntry() function marks records 2 and 3 as duplicates and skips them, keeping only record 1 with output_tokens: 9. The correct value is 159 from record 3.

Impact

In my testing, this results in output tokens being undercounted by roughly 2-3x. The other token fields (input_tokens, cache_creation_input_tokens, cache_read_input_tokens) are unaffected because they are set at the start of the API call and remain constant across streaming chunks.

Example from a single day's data:

Counting Method Output Tokens
First-write-wins (current behavior) ~300K
Last-write-wins (correct behavior) ~870K

Suggested Fix

In createUniqueHash() / isDuplicateEntry(), instead of skipping records that share a messageId:requestId with a previously seen entry, replace the earlier entry with the new one (last-write-wins). This ensures the final streaming chunk — which has the complete output_tokens count — is the one that's kept.

Alternatively, filter for records where stop_reason is non-null, which directly selects the final streaming chunk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions