Skip to content

engine: report kv cache reuse counters#83

Open
bittoby wants to merge 2 commits into
GeniePod:mainfrom
bittoby:dynamo-152-runtime-kv-stats
Open

engine: report kv cache reuse counters#83
bittoby wants to merge 2 commits into
GeniePod:mainfrom
bittoby:dynamo-152-runtime-kv-stats

Conversation

@bittoby
Copy link
Copy Markdown

@bittoby bittoby commented May 23, 2026

Refs GeniePod/genie-claw#152.
Depends on #82.

Summary

  • add prefill_tokens and kv_cache_reused_tokens to generation stats
  • count reused persistent-KV prompt tokens separately from newly-prefilled tokens
  • compute prompt throughput from newly-prefilled tokens
  • expose jetson.cache with prompt, prefill, reused-token, reuse-ratio, and conversation id data

Verification

  • git diff --check origin/main..HEAD
  • cmake -S . -B /tmp/genie-ai-runtime-check -DJLLM_BUILD_SERVER=ON could not complete on this Mac host because CUDA/nvcc is not installed: Failed to find nvcc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant