Skip to content

GuideLLM v0.2.0

Latest
Compare
Choose a tag to compare
@markurtz markurtz released this 18 Apr 21:07
· 5 commits to main since this release
876775f

Summary

  • Minimal Execution Overheads
    • Refactor enabling async multi-process/threaded design with just 0.16% overhead in synchronous and 99.9% accuracy for constant requests
  • Robust Accuracy + Monitoring
    • Built-in timings and diagnostics added to validate performance and catch regressions
  • Flexible Benchmarking Profiles
    • Prebuilt support for synchronous, concurrent (added), throughput, constant rate, poisson rate, and sweep modes
  • Unified Input/Output Formats
    • JSON, YAML, CSV, and console output now standardized
  • Multi-Use Data Loaders
    • Native support for HuggingFace datasets, file-based data, and synthetic samples with fixes for previous flows and expanded support
  • Pluggable Backends via OpenAI-Compatible APIs
    • Redeisgned to work out of the box with OpenAI style HTTP servers, easily expandable to other interfaces and servers. Fixed issues related to improper token lengths and more

What's Changed

  • Add summary metrics to saved json file by @anmarques in #46
  • ADD TGI docs by @philschmid in #43
  • Add missing vllm docs link by @eldarkurtic in #50
  • Change default "role" from "system" to "user" by @philschmid in #53
  • FIX TGI example by @philschmid in #51
  • Revert Summary Metrics and Expand Test Coverage to Stabilize Nightly/Main CI by @markurtz in #58
  • [Dataset]: Iterate through benchmark dataset once by @parfeniukink in #48
  • Replace busy wait in async loop with a Semaphore by @sjmonson in #80
  • Add backend_kwargs to generate_benchmark_report by @jackcook in #78
  • Drop request count check from throughput sweep profile by @sjmonson in #89
  • Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance by @markurtz in #91
  • Multi Process Scheduler Implementation, Benchmarker, and Report Generation Refactor by @markurtz in #96
  • Update the README by @sjmonson in #112
  • Fix units for Req Latency in output to seconds by @smalleni in #113
  • Fix/non integer rates by @thameem-abbas in #116
  • Output support expansion, code hygiene, and tests by @markurtz in #117
  • Bump min python to 3.9 by @sjmonson in #121
  • v0.2.0 Version Update and Docs Expansions by @markurtz in #118
  • Fix issue if async task count does not evenly divide accross process pool by @sjmonson in #120
  • Readme grammar updates and cleanup by @markurtz in #124
  • Update CICD flows to enable automated releases and match the feature set laid out in #56 by @markurtz in #125
  • CI/CD Build Fixes for Release by @markurtz in #126

New Contributors

Full Changelog: v0.1.0...v0.2.0