feat: time-bound runs, live stats display, and send-window metrics by acere · Pull Request #58 · awslabs/llmeter

acere · 2026-04-02T17:20:31Z

Closes #57

What

Adds time-bound test runs, a live stats display, send-window-based throughput metrics, and fixes a StopIteration bug in invocation loops.

Changes

`llmeter/runner.py`

New run_duration parameter on _RunConfig/Runner/run(): clients send requests continuously for a fixed duration. Mutually exclusive with n_requests.
New _invoke_for_duration / _invoke_duration / _invoke_duration_c methods — clean separation from count-bound _invoke_n / _invoke_n_c.
_tick_time_bar async task advances a time-based progress bar every 0.5s.
_run() dispatches to the right invocation path based on _time_bound flag.
total_requests always derived from RunningStats._count (single source of truth).
Both _invoke_n_no_wait and _invoke_for_duration use while/next() instead of for-in-cycle() to prevent StopIteration from silently killing the loop.
record_send() called before each endpoint.invoke() for send-window timing.

`llmeter/utils.py`

RunningStats.record_send(): tracks _first_send_time / _last_send_time.
RPM in snapshot() uses send window instead of response-side elapsed time.
New "output_tps" special spec: aggregate output tokens/s based on send window.
snapshot() returns placeholder values ("—") when _count == 0.

`llmeter/live_display.py` (new)

LiveStatsDisplay: HTML table in Jupyter (grouped columns), ANSI multi-line in terminals.
_classify / _group_stats: auto-groups stats by key patterns (Throughput, TTFT, TTLT, Tokens, Errors).
Updates in-place, shows placeholders immediately before first response.

`llmeter/experiments.py`

LoadTest: new run_duration, low_memory, progress_bar_stats fields forwarded to each run.

`docs/user_guide/run_experiments.md`

New sections: Time-bound runs, Live progress-bar statistics, Low-memory mode.

`examples/Time-bound runs with Bedrock OpenAI API.ipynb` (new)

End-to-end notebook using bedrock-mantle endpoint with LoadTest, custom stats, low-memory mode, and comparison charts (RPM, TPS, TTFT, TTLT vs clients).

Tests (51 new, 504 total)

test_running_stats.py: record_send, update, to_stats, snapshot (placeholders, rpm, output_tps, send window, aggregations).
test_live_display.py: _classify, _group_stats, _in_notebook, LiveStatsDisplay (disabled, terminal, overwrite, prefix).
test_experiments.py: LoadTest with run_duration/low_memory/progress_bar_stats.
test_runner.py: time-bound validation, _invoke_for_duration, full runs with duration.

Usage

# Time-bound run
result = await runner.run(run_duration=60, clients=5)

# Time-bound LoadTest
load_test = LoadTest(
    endpoint=my_endpoint,
    payload=sample_payload,
    sequence_of_clients=[1, 5, 10, 20],
    run_duration=60,
    low_memory=True,
    output_path="outputs/load_test",
)
result = await load_test.run()
result.plot_results()

- Add `low_memory` parameter to Runner/run() that writes responses to disk without keeping them in memory, for large-scale test runs. - Introduce `RunningStats` class that accumulates metrics incrementally (counts, sums, sorted values for percentile computation). - Replace `_builtin_stats` cached_property on Result with `_preloaded_stats` populated by RunningStats during the run or from stats.json on load. - Add `snapshot()` method on RunningStats for live progress-bar display of p50/p90 TTFT, p50/p90 TTLT, median tokens/s, total tokens, and failure count — configurable via `progress_bar_stats` parameter. - Add `_compute_stats()` classmethod on Result as fallback for manually constructed Result objects and post-load_responses() recomputation. - Update tests for the new stats flow.

Add run_duration parameter for time-bound test runs: - New run_duration on Runner/run() and LoadTest: clients send requests continuously for a fixed duration instead of a fixed count. - Dedicated _invoke_for_duration / _invoke_duration_c methods (separate from count-bound _invoke_n / _invoke_n_c). - Time-based progress bar via _tick_time_bar async task. - Mutual exclusivity validation between n_requests and run_duration. Add LiveStatsDisplay for readable live metrics: - New llmeter/live_display.py: HTML table in Jupyter (grouped columns for Throughput, TTFT, TTLT, Tokens, Errors), ANSI multi-line in terminals. Updates in-place, shows placeholders before first response. - Replaces single-line tqdm postfix with a separate stats row. Improve throughput metric accuracy: - RunningStats.record_send() tracks send-side timestamps. - RPM and output_tps use send window (first-to-last request sent) instead of response-side elapsed time, preventing taper-off as clients finish. - output_tps (aggregate tokens/s) added to default snapshot stats. Fix StopIteration silently terminating invocation loops: - Both _invoke_n_no_wait and _invoke_for_duration now use while/next() instead of for-in-cycle() to prevent StopIteration from streaming endpoints from killing the loop. Add LoadTest support for new features: - run_duration, low_memory, progress_bar_stats forwarded to each run. Add example notebook and documentation: - examples/Time-bound runs with Bedrock OpenAI API.ipynb: end-to-end demo using bedrock-mantle endpoint with LoadTest, custom stats, low-memory mode, and comparison charts (RPM, TPS, TTFT, TTLT). - docs/user_guide/run_experiments.md: new sections for time-bound runs, live progress-bar stats, and low-memory mode. Add tests (51 new, 504 total): - test_running_stats.py: record_send, update, to_stats, snapshot (placeholders, rpm, output_tps, send window, aggregations). - test_live_display.py: _classify, _group_stats, _in_notebook, LiveStatsDisplay (disabled, terminal, overwrite, prefix). - test_experiments.py: LoadTest with run_duration/low_memory/ progress_bar_stats field storage and runner forwarding. - test_runner.py: time-bound validation, _invoke_for_duration, full run with duration, output path, multiple clients.

acere added 2 commits April 1, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: time-bound runs, live stats display, and send-window metrics#58

feat: time-bound runs, live stats display, and send-window metrics#58
acere wants to merge 2 commits intoawslabs:mainfrom
acere:feature/time-bound-runs

acere commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acere commented Apr 2, 2026

What

Changes

llmeter/runner.py

llmeter/utils.py

llmeter/live_display.py (new)

llmeter/experiments.py

docs/user_guide/run_experiments.md

examples/Time-bound runs with Bedrock OpenAI API.ipynb (new)

Tests (51 new, 504 total)

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`llmeter/runner.py`

`llmeter/utils.py`

`llmeter/live_display.py` (new)

`llmeter/experiments.py`

`docs/user_guide/run_experiments.md`

`examples/Time-bound runs with Bedrock OpenAI API.ipynb` (new)