feat: time-bound runs, live stats display, and send-window metrics#58
Open
acere wants to merge 2 commits intoawslabs:mainfrom
Open
feat: time-bound runs, live stats display, and send-window metrics#58acere wants to merge 2 commits intoawslabs:mainfrom
acere wants to merge 2 commits intoawslabs:mainfrom
Conversation
- Add `low_memory` parameter to Runner/run() that writes responses to disk without keeping them in memory, for large-scale test runs. - Introduce `RunningStats` class that accumulates metrics incrementally (counts, sums, sorted values for percentile computation). - Replace `_builtin_stats` cached_property on Result with `_preloaded_stats` populated by RunningStats during the run or from stats.json on load. - Add `snapshot()` method on RunningStats for live progress-bar display of p50/p90 TTFT, p50/p90 TTLT, median tokens/s, total tokens, and failure count — configurable via `progress_bar_stats` parameter. - Add `_compute_stats()` classmethod on Result as fallback for manually constructed Result objects and post-load_responses() recomputation. - Update tests for the new stats flow.
Add run_duration parameter for time-bound test runs: - New run_duration on Runner/run() and LoadTest: clients send requests continuously for a fixed duration instead of a fixed count. - Dedicated _invoke_for_duration / _invoke_duration_c methods (separate from count-bound _invoke_n / _invoke_n_c). - Time-based progress bar via _tick_time_bar async task. - Mutual exclusivity validation between n_requests and run_duration. Add LiveStatsDisplay for readable live metrics: - New llmeter/live_display.py: HTML table in Jupyter (grouped columns for Throughput, TTFT, TTLT, Tokens, Errors), ANSI multi-line in terminals. Updates in-place, shows placeholders before first response. - Replaces single-line tqdm postfix with a separate stats row. Improve throughput metric accuracy: - RunningStats.record_send() tracks send-side timestamps. - RPM and output_tps use send window (first-to-last request sent) instead of response-side elapsed time, preventing taper-off as clients finish. - output_tps (aggregate tokens/s) added to default snapshot stats. Fix StopIteration silently terminating invocation loops: - Both _invoke_n_no_wait and _invoke_for_duration now use while/next() instead of for-in-cycle() to prevent StopIteration from streaming endpoints from killing the loop. Add LoadTest support for new features: - run_duration, low_memory, progress_bar_stats forwarded to each run. Add example notebook and documentation: - examples/Time-bound runs with Bedrock OpenAI API.ipynb: end-to-end demo using bedrock-mantle endpoint with LoadTest, custom stats, low-memory mode, and comparison charts (RPM, TPS, TTFT, TTLT). - docs/user_guide/run_experiments.md: new sections for time-bound runs, live progress-bar stats, and low-memory mode. Add tests (51 new, 504 total): - test_running_stats.py: record_send, update, to_stats, snapshot (placeholders, rpm, output_tps, send window, aggregations). - test_live_display.py: _classify, _group_stats, _in_notebook, LiveStatsDisplay (disabled, terminal, overwrite, prefix). - test_experiments.py: LoadTest with run_duration/low_memory/ progress_bar_stats field storage and runner forwarding. - test_runner.py: time-bound validation, _invoke_for_duration, full run with duration, output path, multiple clients.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #57
What
Adds time-bound test runs, a live stats display, send-window-based throughput metrics, and fixes a
StopIterationbug in invocation loops.Changes
llmeter/runner.pyrun_durationparameter on_RunConfig/Runner/run(): clients send requests continuously for a fixed duration. Mutually exclusive withn_requests._invoke_for_duration/_invoke_duration/_invoke_duration_cmethods — clean separation from count-bound_invoke_n/_invoke_n_c._tick_time_barasync task advances a time-based progress bar every 0.5s._run()dispatches to the right invocation path based on_time_boundflag.total_requestsalways derived fromRunningStats._count(single source of truth)._invoke_n_no_waitand_invoke_for_durationusewhile/next()instead offor-in-cycle()to preventStopIterationfrom silently killing the loop.record_send()called before eachendpoint.invoke()for send-window timing.llmeter/utils.pyRunningStats.record_send(): tracks_first_send_time/_last_send_time.snapshot()uses send window instead of response-side elapsed time."output_tps"special spec: aggregate output tokens/s based on send window.snapshot()returns placeholder values ("—") when_count == 0.llmeter/live_display.py(new)LiveStatsDisplay: HTML table in Jupyter (grouped columns), ANSI multi-line in terminals._classify/_group_stats: auto-groups stats by key patterns (Throughput, TTFT, TTLT, Tokens, Errors).llmeter/experiments.pyLoadTest: newrun_duration,low_memory,progress_bar_statsfields forwarded to each run.docs/user_guide/run_experiments.mdexamples/Time-bound runs with Bedrock OpenAI API.ipynb(new)LoadTest, custom stats, low-memory mode, and comparison charts (RPM, TPS, TTFT, TTLT vs clients).Tests (51 new, 504 total)
test_running_stats.py: record_send, update, to_stats, snapshot (placeholders, rpm, output_tps, send window, aggregations).test_live_display.py: _classify, _group_stats, _in_notebook, LiveStatsDisplay (disabled, terminal, overwrite, prefix).test_experiments.py: LoadTest with run_duration/low_memory/progress_bar_stats.test_runner.py: time-bound validation, _invoke_for_duration, full runs with duration.Usage