-
Notifications
You must be signed in to change notification settings - Fork 5
feat: time-bound runs, live stats display, and send-window metrics #57
Description
Summary
Currently, LLMeter runs are always count-bound — you specify a fixed number of requests per client. This makes it difficult to measure sustained throughput over a realistic time window, and the reported RPM/TPS metrics can be skewed by tail latency of the final responses.
This issue proposes time-bound runs and related improvements.
1. Time-bound runs (run_duration)
A new run_duration parameter on Runner.run() and LoadTest that runs each client for a fixed number of seconds instead of a fixed request count. Mutually exclusive with n_requests.
# Run for 60 seconds with 5 concurrent clients
result = await runner.run(run_duration=60, clients=5)
result.total_requests # actual count completed
# LoadTest with time-bound runs
load_test = LoadTest(
endpoint=my_endpoint,
payload=sample_payload,
sequence_of_clients=[1, 5, 10, 20],
run_duration=60,
)
result = await load_test.run()2. Live stats display
A LiveStatsDisplay class that renders running statistics as a grouped HTML table in Jupyter notebooks (Throughput, TTFT, TTLT, Tokens, Errors columns) or as grouped multi-line text in terminals. Replaces the single-line tqdm postfix that was unreadable with many metrics.
3. Send-window metrics
RPM and aggregate output tokens/s now use the send window (first request sent to last request sent) instead of response-side elapsed time. This prevents metrics from tapering off as clients finish their assigned work.
4. StopIteration fix
Streaming endpoints that raise StopIteration (e.g. from next() on an empty stream) no longer silently terminate the invocation loop. Both count-bound and time-bound loops use while/next() instead of for-in-cycle().