Skip to content

feat(bench): add --accuracy mode that prints generated text#1092

Merged
RemiliaForever (RemiliaForever) merged 1 commit into
mainfrom
chore/geniex-bench
Jun 26, 2026
Merged

feat(bench): add --accuracy mode that prints generated text#1092
RemiliaForever (RemiliaForever) merged 1 commit into
mainfrom
chore/geniex-bench

Conversation

@RemiliaForever

Copy link
Copy Markdown
Contributor

What

Adds an --accuracy mode to geniex-bench for eyeballing model output quality rather than measuring speed.

  • Pins a single measured run (--warmup 0 / --repetitions 1), overriding --warmup / -r.
  • Prints the generated text to stdout, with a [gen ] prefix on every line so multi-line output stays greppable and visually attributed.
  • Works for both the LLM and VLM run loops; applies in single-cell and matrix mode.

Pair with --prompt-file for a real prompt — the default random-ids prefill yields meaningless text, which the --help text and README now call out.

Notes

SDK logs go to stderr while the bench output goes to stdout, so 2>/dev/null gives a clean view of the generated text.

Test

Built Release locally (cmake --build build) and ran against a cached Qwen3-0.6B-Q4_0.gguf:

  • --accuracy --prompt-file → single run, full text printed line-by-line with [gen ] prefix.
  • --accuracy -p 8 (default random-ids) → single run, runs without crashing.

Copilot AI review requested due to automatic review settings June 25, 2026 07:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an --accuracy mode to geniex-bench to support quick qualitative inspection of model output (generated text printed to stdout), rather than benchmarking throughput/latency across multiple runs.

Changes:

  • Adds a new --accuracy flag that forces --warmup 0 and --repetitions 1.
  • Prints the generated text to stdout with a [gen ] prefix on each line for both LLM and VLM generation loops.
  • Updates the benchmark README and --help text to document accuracy mode and recommend pairing with --prompt-file.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
sdk/benchmark/README.md Documents the new --accuracy mode and provides an example invocation and guidance about --prompt-file.
sdk/benchmark/benchmark.c Implements --accuracy flag parsing/behavior and adds generated-text printing in both LLM and VLM run loops.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Single-run mode for eyeballing output quality rather than speed: pins
--warmup 0 / --repetitions 1 and prints the generated text to stdout with
a [gen ] prefix on every line. Pair with --prompt-file for a real prompt,
since the default random-ids prefill yields meaningless text.
Base automatically changed from chore/migrate-repo-url to main June 26, 2026 02:55
@RemiliaForever RemiliaForever (RemiliaForever) merged commit edd0183 into main Jun 26, 2026
9 of 11 checks passed
@RemiliaForever RemiliaForever (RemiliaForever) deleted the chore/geniex-bench branch June 26, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants