-
-
Notifications
You must be signed in to change notification settings - Fork 441
Description
Bug Description
ccusage systematically undercounts output_tokens because the deduplication logic in data-loader.ts keeps the first record for a given messageId:requestId pair and discards subsequent ones. However, Claude Code writes multiple JSONL lines per API response as the response streams in, with incrementally increasing output_tokens. The first record has a partial count; the last record has the correct final count.
Reproduction
Look at any streaming response in a Claude Code JSONL file. You'll see multiple lines with the same message.id and requestId:
Record 1: message.id=msg_ABC, requestId=req_123, output_tokens=9, stop_reason=null (partial)
Record 2: message.id=msg_ABC, requestId=req_123, output_tokens=9, stop_reason=null (partial)
Record 3: message.id=msg_ABC, requestId=req_123, output_tokens=159, stop_reason=tool_use (final)
The current isDuplicateEntry() function marks records 2 and 3 as duplicates and skips them, keeping only record 1 with output_tokens: 9. The correct value is 159 from record 3.
Impact
In my testing, this results in output tokens being undercounted by roughly 2-3x. The other token fields (input_tokens, cache_creation_input_tokens, cache_read_input_tokens) are unaffected because they are set at the start of the API call and remain constant across streaming chunks.
Example from a single day's data:
| Counting Method | Output Tokens |
|---|---|
| First-write-wins (current behavior) | ~300K |
| Last-write-wins (correct behavior) | ~870K |
Suggested Fix
In createUniqueHash() / isDuplicateEntry(), instead of skipping records that share a messageId:requestId with a previously seen entry, replace the earlier entry with the new one (last-write-wins). This ensures the final streaming chunk — which has the complete output_tokens count — is the one that's kept.
Alternatively, filter for records where stop_reason is non-null, which directly selects the final streaming chunk.