Release GuideLLM v0.2.0 · neuralmagic/guidellm

Summary

Minimal Execution Overheads
- Refactor enabling async multi-process/threaded design with just 0.16% overhead in synchronous and 99.9% accuracy for constant requests
Robust Accuracy + Monitoring
- Built-in timings and diagnostics added to validate performance and catch regressions
Flexible Benchmarking Profiles
- Prebuilt support for synchronous, concurrent (added), throughput, constant rate, poisson rate, and sweep modes
Unified Input/Output Formats
- JSON, YAML, CSV, and console output now standardized
Multi-Use Data Loaders
- Native support for HuggingFace datasets, file-based data, and synthetic samples with fixes for previous flows and expanded support
Pluggable Backends via OpenAI-Compatible APIs
- Redeisgned to work out of the box with OpenAI style HTTP servers, easily expandable to other interfaces and servers. Fixed issues related to improper token lengths and more

Add summary metrics to saved json file by @anmarques in #46
ADD TGI docs by @philschmid in #43
Add missing vllm docs link by @eldarkurtic in #50
Change default "role" from "system" to "user" by @philschmid in #53
FIX TGI example by @philschmid in #51
Revert Summary Metrics and Expand Test Coverage to Stabilize Nightly/Main CI by @markurtz in #58
[Dataset]: Iterate through benchmark dataset once by @parfeniukink in #48
Replace busy wait in async loop with a Semaphore by @sjmonson in #80
Add backend_kwargs to generate_benchmark_report by @jackcook in #78
Drop request count check from throughput sweep profile by @sjmonson in #89
Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance by @markurtz in #91
Multi Process Scheduler Implementation, Benchmarker, and Report Generation Refactor by @markurtz in #96
Update the README by @sjmonson in #112
Fix units for Req Latency in output to seconds by @smalleni in #113
Fix/non integer rates by @thameem-abbas in #116
Output support expansion, code hygiene, and tests by @markurtz in #117
Bump min python to 3.9 by @sjmonson in #121
v0.2.0 Version Update and Docs Expansions by @markurtz in #118
Fix issue if async task count does not evenly divide accross process pool by @sjmonson in #120
Readme grammar updates and cleanup by @markurtz in #124
Update CICD flows to enable automated releases and match the feature set laid out in #56 by @markurtz in #125
CI/CD Build Fixes for Release by @markurtz in #126

Full Changelog: v0.1.0...v0.2.0