Skip to content

feat: persistent token usage stats#1489

Open
kenvandine wants to merge 16 commits intolemonade-sdk:mainfrom
kenvandine:feat-persistent-token-usage-stats
Open

feat: persistent token usage stats#1489
kenvandine wants to merge 16 commits intolemonade-sdk:mainfrom
kenvandine:feat-persistent-token-usage-stats

Conversation

@kenvandine
Copy link
Copy Markdown
Member

@kenvandine kenvandine commented Mar 30, 2026

Adds persistent token usage statistics tracked server-side and surfaced in the web app via a new Statistics panel.

Server:

  • Router accumulates LifetimeUsageStats — total requests, input/output tokens — bucketed by day and hour, and broken down by model and device type (GPU/NPU/CPU)
  • Stats are persisted to disk (JSON) and loaded on startup, surviving server restarts
  • Covers all request types (completions, audio, image, embeddings, reranking), not just LLM completions
  • Fixes sd-cpp incorrectly reporting GPU backends as CPU in device-type tracking

App:

  • New StatsPanel added to the left rail (chart icon) showing a bar chart of token usage over time
  • Supports day/hour bucket modes, date-range presets (7d / 30d / 90d / 365d / all time), and drill-down to hourly view by clicking a day bar
  • Filterable by model or device type; search box filters the visible date buckets
  • Auto-refreshes every 30 s and on inference completion events

Tests: server_endpoints.py updated to cover the /stats endpoint.

@kenvandine kenvandine marked this pull request as draft March 30, 2026 14:07
kenvandine and others added 10 commits April 1, 2026 09:18
In hour mode, valueSummary was incorrectly populated with lifetimeSummary
instead of chartSummary, causing the "Selected day" card to mirror the
"Lifetime tokens" card. chartSummary already sums the 24 hourly buckets
for the selected day, so valueSummary can simply always be chartSummary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend persistent token usage to record by_day and by_hour buckets per
model alongside the existing aggregate totals. The /stats endpoint now
exposes a by_model map. StatsPanel gains an "All Models" default view
plus per-model chip selectors that filter the chart and summary cards.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Track token usage by device type (cpu/gpu/npu) as a sibling dimension
to by_model in both the persistent JSON and the /stats response.
StatsPanel gains a device chip row (CPU/GPU/NPU) that filters the chart
and summary cards; selecting a model clears the device filter and
vice versa so the two selectors stay mutually exclusive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
execute_inference now always records the request (with 0 tokens) for
every non-streaming call, covering image generation, TTS, audio
transcription, embeddings, and reranking. For LLM completions,
update_telemetry calls the new add_tokens_locked which patches token
counts onto the already-recorded request bucket without double-counting.
The zero-token early-exit guard is removed so image gen requests are
counted even though they have no token metrics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SDServer::load() now updates device_type_ to DEVICE_GPU when the
rocm or vulkan backend is selected. Previously, get_device_type_from_recipe()
hardcoded sd-cpp as DEVICE_CPU regardless of the actual backend in use,
causing rocm/vulkan image generation requests to be recorded under CPU
in the usage stats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kenvandine kenvandine marked this pull request as ready for review April 6, 2026 20:03
@kenvandine kenvandine requested a review from jeremyfowers April 10, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant