feat: persistent token usage stats by kenvandine · Pull Request #1489 · lemonade-sdk/lemonade

kenvandine · 2026-03-30T14:07:47Z

Adds persistent token usage statistics tracked server-side and surfaced in the web app via a new Statistics panel.

Server:

Router accumulates LifetimeUsageStats — total requests, input/output tokens — bucketed by day and hour, and broken down by model and device type (GPU/NPU/CPU)
Stats are persisted to disk (JSON) and loaded on startup, surviving server restarts
Covers all request types (completions, audio, image, embeddings, reranking), not just LLM completions
Fixes sd-cpp incorrectly reporting GPU backends as CPU in device-type tracking

App:

New StatsPanel added to the left rail (chart icon) showing a bar chart of token usage over time
Supports day/hour bucket modes, date-range presets (7d / 30d / 90d / 365d / all time), and drill-down to hourly view by clicking a day bar
Filterable by model or device type; search box filters the visible date buckets
Auto-refreshes every 30 s and on inference completion events

Tests: server_endpoints.py updated to cover the /stats endpoint.

In hour mode, valueSummary was incorrectly populated with lifetimeSummary instead of chartSummary, causing the "Selected day" card to mirror the "Lifetime tokens" card. chartSummary already sums the 24 hourly buckets for the selected day, so valueSummary can simply always be chartSummary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend persistent token usage to record by_day and by_hour buckets per model alongside the existing aggregate totals. The /stats endpoint now exposes a by_model map. StatsPanel gains an "All Models" default view plus per-model chip selectors that filter the chart and summary cards. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Track token usage by device type (cpu/gpu/npu) as a sibling dimension to by_model in both the persistent JSON and the /stats response. StatsPanel gains a device chip row (CPU/GPU/NPU) that filters the chart and summary cards; selecting a model clears the device filter and vice versa so the two selectors stay mutually exclusive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

execute_inference now always records the request (with 0 tokens) for every non-streaming call, covering image generation, TTS, audio transcription, embeddings, and reranking. For LLM completions, update_telemetry calls the new add_tokens_locked which patches token counts onto the already-recorded request bucket without double-counting. The zero-token early-exit guard is removed so image gen requests are counted even though they have no token metrics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

SDServer::load() now updates device_type_ to DEVICE_GPU when the rocm or vulkan backend is selected. Previously, get_device_type_from_recipe() hardcoded sd-cpp as DEVICE_CPU regardless of the actual backend in use, causing rocm/vulkan image generation requests to be recorded under CPU in the usage stats. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kenvandine added 4 commits March 28, 2026 20:39

feat: Add persistent token usage stats

67d6a74

Add stats dashboard to app

de4fff3

Restore status bar /stats telemetry

a2a700d

Refine stats chart interactions

c8f92ca

kenvandine marked this pull request as draft March 30, 2026 14:07

kenvandine and others added 10 commits April 1, 2026 09:18

Merge branch 'main' into feat-persistent-token-usage-stats

6a809b0

Merge branch 'main' into feat-persistent-token-usage-stats

b3cb446

Merge branch 'main' into feat-persistent-token-usage-stats

2f9f1fb

Merge branch 'main' into feat-persistent-token-usage-stats

ee44e57

Merge branch 'main' into feat-persistent-token-usage-stats

b67509a

kenvandine marked this pull request as ready for review April 6, 2026 20:03

kenvandine added 2 commits April 8, 2026 08:04

Merge branch 'main' into feat-persistent-token-usage-stats

83df4e2

Merge branch 'main' into feat-persistent-token-usage-stats

bfeb660

kenvandine requested a review from jeremyfowers April 10, 2026 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: persistent token usage stats#1489

feat: persistent token usage stats#1489
kenvandine wants to merge 16 commits intolemonade-sdk:mainfrom
kenvandine:feat-persistent-token-usage-stats

kenvandine commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kenvandine commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kenvandine commented Mar 30, 2026 •

edited

Loading