feat: persistent token usage stats#1489
Open
kenvandine wants to merge 16 commits intolemonade-sdk:mainfrom
Open
feat: persistent token usage stats#1489kenvandine wants to merge 16 commits intolemonade-sdk:mainfrom
kenvandine wants to merge 16 commits intolemonade-sdk:mainfrom
Conversation
In hour mode, valueSummary was incorrectly populated with lifetimeSummary instead of chartSummary, causing the "Selected day" card to mirror the "Lifetime tokens" card. chartSummary already sums the 24 hourly buckets for the selected day, so valueSummary can simply always be chartSummary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend persistent token usage to record by_day and by_hour buckets per model alongside the existing aggregate totals. The /stats endpoint now exposes a by_model map. StatsPanel gains an "All Models" default view plus per-model chip selectors that filter the chart and summary cards. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Track token usage by device type (cpu/gpu/npu) as a sibling dimension to by_model in both the persistent JSON and the /stats response. StatsPanel gains a device chip row (CPU/GPU/NPU) that filters the chart and summary cards; selecting a model clears the device filter and vice versa so the two selectors stay mutually exclusive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
execute_inference now always records the request (with 0 tokens) for every non-streaming call, covering image generation, TTS, audio transcription, embeddings, and reranking. For LLM completions, update_telemetry calls the new add_tokens_locked which patches token counts onto the already-recorded request bucket without double-counting. The zero-token early-exit guard is removed so image gen requests are counted even though they have no token metrics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SDServer::load() now updates device_type_ to DEVICE_GPU when the rocm or vulkan backend is selected. Previously, get_device_type_from_recipe() hardcoded sd-cpp as DEVICE_CPU regardless of the actual backend in use, causing rocm/vulkan image generation requests to be recorded under CPU in the usage stats. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds persistent token usage statistics tracked server-side and surfaced in the web app via a new Statistics panel.
Server:
App:
Tests: server_endpoints.py updated to cover the /stats endpoint.