feat(bench): add --accuracy mode that prints generated text#1092
Merged
Conversation
Copilot started reviewing on behalf of
RemiliaForever (RemiliaForever)
June 25, 2026 07:15
View session
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an --accuracy mode to geniex-bench to support quick qualitative inspection of model output (generated text printed to stdout), rather than benchmarking throughput/latency across multiple runs.
Changes:
- Adds a new
--accuracyflag that forces--warmup 0and--repetitions 1. - Prints the generated text to stdout with a
[gen ]prefix on each line for both LLM and VLM generation loops. - Updates the benchmark README and
--helptext to document accuracy mode and recommend pairing with--prompt-file.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| sdk/benchmark/README.md | Documents the new --accuracy mode and provides an example invocation and guidance about --prompt-file. |
| sdk/benchmark/benchmark.c | Implements --accuracy flag parsing/behavior and adds generated-text printing in both LLM and VLM run loops. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ed3a01 to
c495fa9
Compare
12470ce to
3ab86c7
Compare
c495fa9 to
81991ca
Compare
3ab86c7 to
589abb1
Compare
Single-run mode for eyeballing output quality rather than speed: pins --warmup 0 / --repetitions 1 and prints the generated text to stdout with a [gen ] prefix on every line. Pair with --prompt-file for a real prompt, since the default random-ids prefill yields meaningless text.
81991ca to
f2aa7e9
Compare
589abb1 to
5a3483b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an
--accuracymode togeniex-benchfor eyeballing model output quality rather than measuring speed.--warmup 0/--repetitions 1), overriding--warmup/-r.[gen ]prefix on every line so multi-line output stays greppable and visually attributed.Pair with
--prompt-filefor a real prompt — the default random-ids prefill yields meaningless text, which the--helptext and README now call out.Notes
SDK logs go to stderr while the bench output goes to stdout, so
2>/dev/nullgives a clean view of the generated text.Test
Built Release locally (
cmake --build build) and ran against a cachedQwen3-0.6B-Q4_0.gguf:--accuracy --prompt-file→ single run, full text printed line-by-line with[gen ]prefix.--accuracy -p 8(default random-ids) → single run, runs without crashing.