vllm-project · VincentG1234 · Feb 28, 2026 · Mar 11, 2026 · Mar 15, 2026 · Mar 17, 2026
diff --git a/docs/getting-started/benchmark.md b/docs/getting-started/benchmark.md
@@ -65,6 +65,7 @@ GuideLLM offers a wide range of configuration options to customize your benchmar
 | `--random-seed`  | Random seed for reproducibility                | `--random-seed 42`                             |
 | `--max-seconds`  | Duration for each benchmark in seconds         | `--max-seconds 30`                             |
 | `--max-requests` | Maximum number of requests for each benchmark  | `--max-requests 1000`                          |
+| `--data-samples` | Maximum number of dataset rows to load         | `--data-samples 1000`                          |
 | `--output-dir`   | Directory path to save output files            | `--output-dir results/`                        |
 | `--outputs`      | Output formats to generate                     | `--outputs json csv html`                      |
 
@@ -187,6 +188,34 @@ guidellm benchmark \
 
 You can customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values. See the [Datasets Synthetic data documentation](../guides/datasets.md#synthetic-data) for more details.
 
+### Trace Replay Benchmarking (beta)
+
+For realistic load testing, replay trace events using each row's timestamp and token lengths. Trace files must be JSONL and are loaded with the `trace_synthetic` data type. By default, each row uses `timestamp`, `input_length`, and `output_length` fields. Timestamps may be absolute or monotonic values; GuideLLM sorts them and converts them to offsets from the first event before scheduling:
+
+```json
+{"timestamp": 1234500.0, "input_length": 256, "output_length": 128}
+{"timestamp": 1234500.5, "input_length": 512, "output_length": 64}
+```
+
+In this example, the second request is scheduled 0.5 seconds after the first request.
+
+Run with the `replay` profile:
+
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --data path/to/trace.jsonl \
+  --data-args type_=trace_synthetic \
+  --profile replay \
+  --rate 1.0
+```
+
+The `--rate` parameter acts as a time scale for the intervals between trace events, not requests per second: `1.0` preserves the original timing, `2.0` doubles the intervals and runs twice as long, and `0.5` halves the intervals and runs twice as fast.
+
+GuideLLM orders trace rows by timestamp before scheduling and payload generation, so each scheduled event uses the token lengths from the same sorted row. Use `--data-samples` to limit how many trace rows are loaded and replayed. `--max-requests` remains a runtime completion constraint; it does not truncate the trace dataset.
+
+If your trace uses different column names, map them with `timestamp_column`, `prompt_tokens_column`, and `output_tokens_column` in `--data-args`.
+
 ### Working with Real Data
 
 While synthetic data is convenient for quick tests, you can benchmark with real-world data:

diff --git a/docs/guides/datasets.md b/docs/guides/datasets.md
@@ -13,6 +13,10 @@ The following arguments can be used to configure datasets and their processing:
   - `prompt_column`: Specifies the column name for the prompt. By default, GuideLLM will try the most common column names (e.g., `prompt`, `text`, `input`).
   - `prompt_tokens_count_column`: Specifies the column name for the prompt token count. These are used to set the request prompt token count for counting metrics. By default, GuideLLM assumes no token count is provided.
   - `output_tokens_count_column`: Specifies the column name for the output token count. These are used to set the request output token count for the request and counting metrics. By default, GuideLLM assumes no token count is provided.
+  - `type_`: Selects a specialized dataset deserializer, such as `trace_synthetic` for trace replay files.
+  - `timestamp_column`: Specifies the timestamp column for `trace_synthetic` data. The default is `timestamp`.
+  - `prompt_tokens_column`: Specifies the prompt token length column for `trace_synthetic` data. The default is `input_length`.
+  - `output_tokens_column`: Specifies the output token length column for `trace_synthetic` data. The default is `output_length`.
   - `split`: Specifies the dataset split to use (e.g., `train`, `val`, `test`). By default, GuideLLM will try the most common split names (e.g., `train`, `validation`, `test`) if the dataset has splits, otherwise it will use the entire dataset.
   - Any remaining arguments are passed directly into the dataset constructor as kwargs.
 - `--data-sampler`: Specifies the sampling strategy for datasets. By default, no sampling is applied. When set to `random`, it enables random shuffling of the dataset, which can be useful for creating diverse batches during benchmarking.
@@ -116,22 +120,62 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
 #### Supported Formats with Examples
 
 - **Text files (`.txt`, `.text`)**: Where each line is a separate prompt to use.
+
   ```
   Hello, how are you?
   What is your name?
   ```
+
 - **CSV files (`.csv`)**: Where each row is a separate dataset entry and the first row contains the column names. The columns should include `prompt` or other common names for the prompt which will be used as the prompt column. Additional columns can be included based on the previously mentioned aliases for the `--data-column-mapper` argument.
+
   ```csv
   prompt,output_tokens_count,additional_column,additional_column2
   Hello, how are you?,5,foo,bar
   What is your name?,3,baz,qux
   ```
+
 - **JSON Lines files (`.jsonl`)**: Where each line is a separate JSON object. The objects should include `prompt` or other common names for the prompt which will be used as the prompt column. Additional fields can be included based on the previously mentioned aliases for the `--data-args` argument.
+
   ```json
   {"prompt": "Hello, how are you?", "output_tokens_count": 5, "additional_column": "foo", "additional_column2": "bar"}
   {"prompt": "What is your name?", "output_tokens_count": 3, "additional_column": "baz", "additional_column2": "qux"}
   ```
+
+- **Trace files (`.jsonl` with `trace_synthetic` type)**: Specialized JSONL files for replay benchmarking with `timestamp`, `input_length`, and `output_length` fields. Used with `--profile replay` to replay trace events using each row's timestamp and token lengths. Timestamps must be numbers expressed in seconds on a shared timeline with any consistent zero point; GuideLLM sorts them and converts them to offsets from the first event before scheduling. Date strings are not parsed yet, so provide timestamps as numbers. See [Trace Replay Benchmarking](../getting-started/benchmark.md#trace-replay-benchmarking).
+
+  ```json
+  {"timestamp": 1234500.0, "input_length": 256, "output_length": 128}
+  {"timestamp": 1234500.5, "input_length": 512, "output_length": 64}
+  ```
+
+  In this example, the second request is scheduled 0.5 seconds after the first request. Trace rows are ordered by timestamp before GuideLLM schedules requests and generates synthetic payloads. This keeps each scheduled event aligned with the prompt and output token lengths from the same row.
+
+  Use `--data-args type_=trace_synthetic` to enable trace loading:
+
+  ```bash
+  guidellm benchmark \
+      --target http://localhost:8000 \
+      --profile replay \
+      --rate 1.0 \
+      --data path/to/trace.jsonl \
+      --data-args type_=trace_synthetic
+  ```
+
+  If your trace uses different column names, configure them with `timestamp_column`, `prompt_tokens_column`, and `output_tokens_column`:
+
+  ```bash
+  guidellm benchmark \
+      --target http://localhost:8000 \
+      --profile replay \
+      --rate 1.0 \
+      --data replay.jsonl \
+      --data-args type_=trace_synthetic,timestamp_column=timestamp,prompt_tokens_column=input_length,output_tokens_column=output_length
+  ```
+
+  For replay, `--rate` is a time scale for the intervals between trace events rather than requests per second. Use `--data-samples` to limit how many trace rows are loaded and replayed. Use `--max-requests` only as a runtime completion constraint; it does not limit the trace rows loaded from the file.
+
 - **JSON files (`.json`)**: Where the entire dataset is represented as a JSON array of objects nested under a specific key. To surface the correct key to use, a `--data-column-mapper` argument must be passed in of `"field": "NAME"` for where the array exists. The objects should include `prompt` or other common names for the prompt which will be used as the prompt column. Additional fields can be included based on the previously mentioned aliases for the `--data-column-mapper` argument.
+
   ```json
   {
     "version": "1.0",
@@ -141,8 +185,11 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
     ]
   }
   ```
+
 - **Parquet files (`.parquet`)** Example: A binary columnar storage format for efficient data processing. For more information on the supported formats, see the Hugging Face dataset documentation linked in the [Notes](#notes) section.
+
 - **Arrow files (`.arrow`)** Example: A cross-language development platform for in-memory data. For more information on the supported formats, see the Hugging Face dataset documentation linked in the [Notes](#notes) section.
+
 - **HDF5 files (`.hdf5`)** Example: A hierarchical data format for storing large amounts of data. For more information on the supported formats, see the Hugging Face dataset documentation linked in the [Notes](#notes) section.
 
 #### Example Commands

diff --git a/src/guidellm/benchmark/entrypoints.py b/src/guidellm/benchmark/entrypoints.py
@@ -355,6 +355,8 @@ async def resolve_profile(
     max_global_error_rate: float | None,
     over_saturation: dict[str, Any] | None = None,
     console: Console | None = None,
+    data: list[Any] | None = None,
+    **profile_kwargs: Any,
 ) -> Profile:
     """
     Resolve and configure a benchmark profile with rate and constraint settings.
@@ -376,6 +378,8 @@ async def resolve_profile(
     :param max_global_error_rate: Maximum global error rate threshold before stopping
     :param over_saturation: Over-saturation detection configuration (dict)
     :param console: Console instance for progress reporting, or None
+    :param data: Optional list of data sources.
+    :param profile_kwargs: Additional profile-specific arguments.
     :return: Configured Profile instance ready for benchmarking
     :raises ValueError: If constraints are provided with a pre-configured Profile
     """
@@ -403,6 +407,8 @@ async def resolve_profile(
             random_seed=random_seed,
             rampup_duration=rampup,
             constraints={**constraints},
+            data=data,
+            **profile_kwargs,
         )
     elif constraints:
         raise ValueError(
@@ -536,6 +542,9 @@ async def benchmark_generative_text(
         max_global_error_rate=args.max_global_error_rate,
         over_saturation=args.over_saturation,
         console=console,
+        data=args.data,
+        data_args=args.data_args,
+        data_samples=request_loader.info.get("data_samples", -1),
     )
     output_formats = await resolve_output_formats(
         outputs=args.outputs, output_dir=args.output_dir, console=console

diff --git a/src/guidellm/benchmark/profiles.py b/src/guidellm/benchmark/profiles.py
@@ -13,6 +13,7 @@
 
 from abc import ABC, abstractmethod
 from collections.abc import Generator
+from pathlib import Path
 from typing import TYPE_CHECKING, Annotated, Any, ClassVar, Literal
 
 import numpy as np
@@ -37,8 +38,10 @@
     SchedulingStrategy,
     SynchronousStrategy,
     ThroughputStrategy,
+    TraceReplayStrategy,
 )
 from guidellm.schemas import PydanticClassRegistryMixin
+from guidellm.utils.trace_io import load_relative_timestamps
 
 if TYPE_CHECKING:
     from guidellm.benchmark.schemas import Benchmark
@@ -48,13 +51,14 @@
     "ConcurrentProfile",
     "Profile",
     "ProfileType",
+    "ReplayProfile",
     "SweepProfile",
     "SynchronousProfile",
     "ThroughputProfile",
 ]
 
 ProfileType = Annotated[
-    Literal["synchronous", "concurrent", "throughput", "async", "sweep"],
+    Literal["synchronous", "concurrent", "throughput", "async", "sweep", "replay"],
     "Profile type identifiers for polymorphic deserialization",
 ]
 
@@ -328,6 +332,110 @@ def next_strategy(
         return SynchronousStrategy()
 
 
+@Profile.register("replay")
+class ReplayProfile(Profile):
+    """
+    Replay a trace file:
+    schedule each request at start_time + time_scale * relative_timestamp[i].
+
+    For this profile, the ``rate`` argument is interpreted as time_scale (scale factor
+    applied to relative timestamps), not as requests per second.
+
+    When ``data_samples`` is set, the replayed timestamps are truncated to match
+    the sampled dataset size.
+    """
+
+    type_: Literal["replay"] = "replay"  # type: ignore[assignment]
+    relative_timestamps: list[float] = Field(
+        description="Request start times relative to first event (first = 0)",
+    )
+    time_scale: float = Field(
+        default=1.0,
+        gt=0,
+        description="Scale factor applied to relative timestamps",
+    )
+
+    @classmethod
+    def resolve_args(
+        cls,
+        rate_type: str,
+        rate: list[float] | None,
+        random_seed: int,
+        **kwargs: Any,
+    ) -> dict[str, Any]:
+        _ = (rate_type, random_seed)  # unused
+        data = kwargs.get("data")
+        if not data:
+            raise ValueError("Replay profile requires data (path to trace file)")
+        if len(data) != 1:
+            raise ValueError(
+                f"ReplayProfile requires exactly one data source, received {len(data)}"
+            )
+        if not data[0]:
+            raise ValueError("Replay profile requires data (path to trace file)")
+        path = Path(data[0]) if isinstance(data[0], str) else data[0]
+        if not path.exists():
+            raise ValueError(f"Replay trace file not found: {path}")
+
+        # For replay profile, rate is interpreted as time_scale (not requests per
+        # second)
+        time_scale = rate[0] if rate and len(rate) > 0 else 1.0
+
+        # Honor a custom timestamp column when configured via --data-args so the
+        # replay profile and trace_synthetic deserializer use the same field.
+        data_args = kwargs.get("data_args") or []
+        first_args = data_args[0] if data_args else {}
+        timestamp_column = "timestamp"
+        if isinstance(first_args, dict):
+            raw_timestamp_column = first_args.get("timestamp_column")
+            if isinstance(raw_timestamp_column, str) and raw_timestamp_column.strip():
+                timestamp_column = raw_timestamp_column
+
+        relative_timestamps = load_relative_timestamps(
+            path, timestamp_column=timestamp_column
+        )
+        data_samples = kwargs.get("data_samples", -1)
+        if isinstance(data_samples, int) and data_samples > 0:
+            relative_timestamps = relative_timestamps[:data_samples]
+
+        if not relative_timestamps:
+            raise ValueError(
+                "No timestamps remain after applying data_samples. "
+                "The trace is empty or all events were filtered out."
+            )
+
+        constraints = dict(kwargs.get("constraints") or {})
+        if not any(
+            key in constraints
+            for key in ("max_number", "max_num", "max_requests", "max_req")
+        ):
+            constraints["max_requests"] = len(relative_timestamps)
+
+        return {
+            "relative_timestamps": relative_timestamps,
+            "time_scale": time_scale,
+            "constraints": constraints,
+        }
+
+    @property
+    def strategy_types(self) -> list[str]:
+        return ["trace"]
+
+    def next_strategy(
+        self,
+        prev_strategy: SchedulingStrategy | None,
+        prev_benchmark: Benchmark | None,
+    ) -> TraceReplayStrategy | None:
+        _ = prev_benchmark
+        # Replay has a single strategy; return it once, then None
+        if prev_strategy is not None:
+            return None
+        return TraceReplayStrategy(
+            relative_timestamps=self.relative_timestamps,
+            time_scale=self.time_scale,
+        )
+
+
 @Profile.register("concurrent")
 class ConcurrentProfile(Profile):
     """

diff --git a/src/guidellm/data/deserializers/__init__.py b/src/guidellm/data/deserializers/__init__.py
@@ -25,6 +25,7 @@
     SyntheticTextDataset,
     SyntheticTextDatasetDeserializer,
 )
+from .trace_synthetic import TraceSyntheticDatasetDeserializer
 
 __all__ = [
     "ArrowFileDatasetDeserializer",
@@ -46,4 +47,5 @@
     "SyntheticTextDatasetDeserializer",
     "TarFileDatasetDeserializer",
     "TextFileDatasetDeserializer",
+    "TraceSyntheticDatasetDeserializer",
 ]