Skip to content

feat: time-bound runs, live stats display, and send-window metrics#58

Open
acere wants to merge 2 commits intoawslabs:mainfrom
acere:feature/time-bound-runs
Open

feat: time-bound runs, live stats display, and send-window metrics#58
acere wants to merge 2 commits intoawslabs:mainfrom
acere:feature/time-bound-runs

Conversation

@acere
Copy link
Copy Markdown
Collaborator

@acere acere commented Apr 2, 2026

Closes #57

What

Adds time-bound test runs, a live stats display, send-window-based throughput metrics, and fixes a StopIteration bug in invocation loops.

Changes

llmeter/runner.py

  • New run_duration parameter on _RunConfig/Runner/run(): clients send requests continuously for a fixed duration. Mutually exclusive with n_requests.
  • New _invoke_for_duration / _invoke_duration / _invoke_duration_c methods — clean separation from count-bound _invoke_n / _invoke_n_c.
  • _tick_time_bar async task advances a time-based progress bar every 0.5s.
  • _run() dispatches to the right invocation path based on _time_bound flag.
  • total_requests always derived from RunningStats._count (single source of truth).
  • Both _invoke_n_no_wait and _invoke_for_duration use while/next() instead of for-in-cycle() to prevent StopIteration from silently killing the loop.
  • record_send() called before each endpoint.invoke() for send-window timing.

llmeter/utils.py

  • RunningStats.record_send(): tracks _first_send_time / _last_send_time.
  • RPM in snapshot() uses send window instead of response-side elapsed time.
  • New "output_tps" special spec: aggregate output tokens/s based on send window.
  • snapshot() returns placeholder values ("—") when _count == 0.

llmeter/live_display.py (new)

  • LiveStatsDisplay: HTML table in Jupyter (grouped columns), ANSI multi-line in terminals.
  • _classify / _group_stats: auto-groups stats by key patterns (Throughput, TTFT, TTLT, Tokens, Errors).
  • Updates in-place, shows placeholders immediately before first response.

llmeter/experiments.py

  • LoadTest: new run_duration, low_memory, progress_bar_stats fields forwarded to each run.

docs/user_guide/run_experiments.md

  • New sections: Time-bound runs, Live progress-bar statistics, Low-memory mode.

examples/Time-bound runs with Bedrock OpenAI API.ipynb (new)

  • End-to-end notebook using bedrock-mantle endpoint with LoadTest, custom stats, low-memory mode, and comparison charts (RPM, TPS, TTFT, TTLT vs clients).

Tests (51 new, 504 total)

  • test_running_stats.py: record_send, update, to_stats, snapshot (placeholders, rpm, output_tps, send window, aggregations).
  • test_live_display.py: _classify, _group_stats, _in_notebook, LiveStatsDisplay (disabled, terminal, overwrite, prefix).
  • test_experiments.py: LoadTest with run_duration/low_memory/progress_bar_stats.
  • test_runner.py: time-bound validation, _invoke_for_duration, full runs with duration.

Usage

# Time-bound run
result = await runner.run(run_duration=60, clients=5)

# Time-bound LoadTest
load_test = LoadTest(
    endpoint=my_endpoint,
    payload=sample_payload,
    sequence_of_clients=[1, 5, 10, 20],
    run_duration=60,
    low_memory=True,
    output_path="outputs/load_test",
)
result = await load_test.run()
result.plot_results()

acere added 2 commits April 1, 2026 11:38
- Add `low_memory` parameter to Runner/run() that writes responses to
  disk without keeping them in memory, for large-scale test runs.
- Introduce `RunningStats` class that accumulates metrics incrementally
  (counts, sums, sorted values for percentile computation).
- Replace `_builtin_stats` cached_property on Result with `_preloaded_stats`
  populated by RunningStats during the run or from stats.json on load.
- Add `snapshot()` method on RunningStats for live progress-bar display
  of p50/p90 TTFT, p50/p90 TTLT, median tokens/s, total tokens, and
  failure count — configurable via `progress_bar_stats` parameter.
- Add `_compute_stats()` classmethod on Result as fallback for manually
  constructed Result objects and post-load_responses() recomputation.
- Update tests for the new stats flow.
Add run_duration parameter for time-bound test runs:
- New run_duration on Runner/run() and LoadTest: clients send requests
  continuously for a fixed duration instead of a fixed count.
- Dedicated _invoke_for_duration / _invoke_duration_c methods (separate
  from count-bound _invoke_n / _invoke_n_c).
- Time-based progress bar via _tick_time_bar async task.
- Mutual exclusivity validation between n_requests and run_duration.

Add LiveStatsDisplay for readable live metrics:
- New llmeter/live_display.py: HTML table in Jupyter (grouped columns
  for Throughput, TTFT, TTLT, Tokens, Errors), ANSI multi-line in
  terminals. Updates in-place, shows placeholders before first response.
- Replaces single-line tqdm postfix with a separate stats row.

Improve throughput metric accuracy:
- RunningStats.record_send() tracks send-side timestamps.
- RPM and output_tps use send window (first-to-last request sent)
  instead of response-side elapsed time, preventing taper-off as
  clients finish.
- output_tps (aggregate tokens/s) added to default snapshot stats.

Fix StopIteration silently terminating invocation loops:
- Both _invoke_n_no_wait and _invoke_for_duration now use while/next()
  instead of for-in-cycle() to prevent StopIteration from streaming
  endpoints from killing the loop.

Add LoadTest support for new features:
- run_duration, low_memory, progress_bar_stats forwarded to each run.

Add example notebook and documentation:
- examples/Time-bound runs with Bedrock OpenAI API.ipynb: end-to-end
  demo using bedrock-mantle endpoint with LoadTest, custom stats,
  low-memory mode, and comparison charts (RPM, TPS, TTFT, TTLT).
- docs/user_guide/run_experiments.md: new sections for time-bound runs,
  live progress-bar stats, and low-memory mode.

Add tests (51 new, 504 total):
- test_running_stats.py: record_send, update, to_stats, snapshot
  (placeholders, rpm, output_tps, send window, aggregations).
- test_live_display.py: _classify, _group_stats, _in_notebook,
  LiveStatsDisplay (disabled, terminal, overwrite, prefix).
- test_experiments.py: LoadTest with run_duration/low_memory/
  progress_bar_stats field storage and runner forwarding.
- test_runner.py: time-bound validation, _invoke_for_duration,
  full run with duration, output path, multiple clients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: time-bound runs, live stats display, and send-window metrics

1 participant