neuralmagic · jaredoconnell · Jun 12, 2025 · Jun 12, 2025 · Jun 13, 2025 · Jun 13, 2025
diff --git a/.gitignore b/.gitignore
@@ -168,7 +168,7 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.idea/
 
 
 # MacOS files

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -3,7 +3,9 @@ repos:
   rev: v4.6.0
   hooks:
   - id: trailing-whitespace
+    exclude: ^tests/?.*/assets/.+
   - id: end-of-file-fixer
+    exclude: ^tests/?.*/assets/.+
 - repo: https://github.com/astral-sh/ruff-pre-commit
   rev: v0.11.7
   hooks:

diff --git a/README.md b/README.md
@@ -68,12 +68,12 @@ For information on starting other supported inference servers or platforms, see
 
 #### 2. Run a GuideLLM Benchmark
 
-To run a GuideLLM benchmark, use the `guidellm benchmark` command with the target set to an OpenAI-compatible server. For this example, the target is set to 'http://localhost:8000', assuming that vLLM is active and running on the same server. Otherwise, update it to the appropriate location. By default, GuideLLM automatically determines the model available on the server and uses it. To target a different model, pass the desired name with the `--model` argument. Additionally, the `--rate-type` is set to `sweep`, which automatically runs a range of benchmarks to determine the minimum and maximum rates that the server and model can support. Each benchmark run under the sweep will run for 30 seconds, as set by the `--max-seconds` argument. Finally, `--data` is set to a synthetic dataset with 256 prompt tokens and 128 output tokens per request. For more arguments, supported scenarios, and configurations, jump to the [Configurations Section](#configurations) or run `guidellm benchmark --help`.
+To run a GuideLLM benchmark, use the `guidellm benchmark run` command with the target set to an OpenAI-compatible server. For this example, the target is set to 'http://localhost:8000', assuming that vLLM is active and running on the same server. Otherwise, update it to the appropriate location. By default, GuideLLM automatically determines the model available on the server and uses it. To target a different model, pass the desired name with the `--model` argument. Additionally, the `--rate-type` is set to `sweep`, which automatically runs a range of benchmarks to determine the minimum and maximum rates that the server and model can support. Each benchmark run under the sweep will run for 30 seconds, as set by the `--max-seconds` argument. Finally, `--data` is set to a synthetic dataset with 256 prompt tokens and 128 output tokens per request. For more arguments, supported scenarios, and configurations, jump to the [Configurations Section](#configurations) or run `guidellm benchmark --help`.
 
 Now, to start benchmarking, run the following command:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
   --target "http://localhost:8000" \
   --rate-type sweep \
   --max-seconds 30 \
@@ -110,11 +110,11 @@ For further details on determining the optimal request rate and SLOs, refer to t
 
 ### Configurations
 
-GuideLLM offers a range of configurations through both the benchmark CLI command and environment variables, which provide default values and more granular controls. The most common configurations are listed below. A complete list is easily accessible, though, by running `guidellm benchmark --help` or `guidellm config` respectively.
+GuideLLM offers a range of configurations through both the benchmark CLI command and environment variables, which provide default values and more granular controls. The most common configurations are listed below. A complete list is easily accessible, though, by running `guidellm benchmark run --help` or `guidellm config` respectively.
 
 #### Benchmark CLI
 
-The `guidellm benchmark` command is used to run benchmarks against a generative AI backend/server. The command accepts a variety of arguments to customize the benchmark run. The most common arguments include:
+The `guidellm benchmark run` command is used to run benchmarks against a generative AI backend/server. The command accepts a variety of arguments to customize the benchmark run. The most common arguments include:
 
 - `--target`: Specifies the target path for the backend to run benchmarks against. For example, `http://localhost:8000`. This is required to define the server endpoint.
 

diff --git a/docs/datasets.md b/docs/datasets.md
@@ -20,7 +20,7 @@ The following arguments can be used to configure datasets and their processing:
 ### Example Usage
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
     --target "http://localhost:8000" \
     --rate-type "throughput" \
     --max-requests 1000 \
@@ -49,7 +49,7 @@ For different use cases, here are the recommended dataset profiles to pass as ar
 #### Example Commands
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
     --target "http://localhost:8000" \
     --rate-type "throughput" \
     --max-requests 1000 \
@@ -59,7 +59,7 @@ guidellm benchmark \
 Or using a JSON string:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
     --target "http://localhost:8000" \
     --rate-type "throughput" \
     --max-requests 1000 \
@@ -90,7 +90,7 @@ GuideLLM supports datasets from the Hugging Face Hub or local directories that f
 #### Example Commands
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
     --target "http://localhost:8000" \
     --rate-type "throughput" \
     --max-requests 1000 \
@@ -100,7 +100,7 @@ guidellm benchmark \
 Or using a local dataset:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
     --target "http://localhost:8000" \
     --rate-type "throughput" \
     --max-requests 1000 \
@@ -152,7 +152,7 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
 #### Example Commands
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
     --target "http://localhost:8000" \
     --rate-type "throughput" \
     --max-requests 1000 \

diff --git a/docs/outputs.md b/docs/outputs.md
@@ -5,7 +5,7 @@ GuideLLM provides flexible options for outputting benchmark results, catering to
 For all of the output formats, `--output-extras` can be used to include additional information. This could include tags, metadata, hardware details, and other relevant information that can be useful for analysis. This must be supplied as a JSON encoded string. For example:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
   --target "http://localhost:8000" \
   --rate-type sweep \
   --max-seconds 30 \
@@ -26,21 +26,21 @@ By default, GuideLLM displays benchmark results and progress directly in the con
 
 ### Disabling Console Output
 
-To disable the progress outputs to the console, use the `disable-progress` flag when running the `guidellm benchmark` command. For example:
+To disable the progress outputs to the console, use the `disable-progress` flag when running the `guidellm benchmark run` command. For example:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
   --target "http://localhost:8000" \
   --rate-type sweep \
   --max-seconds 30 \
   --data "prompt_tokens=256,output_tokens=128" \
   --disable-progress
 ```
 
-To disable console output, use the `--disable-console-outputs` flag when running the `guidellm benchmark` command. For example:
+To disable console output, use the `--disable-console-outputs` flag when running the `guidellm benchmark run` command. For example:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
   --target "http://localhost:8000" \
   --rate-type sweep \
   --max-seconds 30 \
@@ -50,10 +50,10 @@ guidellm benchmark \
 
 ### Enabling Extra Information
 
-GuideLLM includes the option to display extra information during the benchmark runs to monitor the overheads and performance of the system. This can be enabled by using the `--display-scheduler-stats` flag when running the `guidellm benchmark` command. For example:
+GuideLLM includes the option to display extra information during the benchmark runs to monitor the overheads and performance of the system. This can be enabled by using the `--display-scheduler-stats` flag when running the `guidellm benchmark run` command. For example:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
   --target "http://localhost:8000" \
   --rate-type sweep \
   --max-seconds 30 \
@@ -81,7 +81,7 @@ GuideLLM supports saving benchmark results to files in various formats, includin
 Example command to save results in YAML format:
 
 ```bash
-guidellm benchmark \
+guidellm benchmark run \
   --target "http://localhost:8000" \
   --rate-type sweep \
   --max-seconds 30 \

diff --git a/src/guidellm/__main__.py b/src/guidellm/__main__.py
@@ -7,12 +7,16 @@
 from pydantic import ValidationError
 
 from guidellm.backend import BackendType
-from guidellm.benchmark import ProfileType
+from guidellm.benchmark import (
+    ProfileType,
+    reimport_benchmarks_report,
+)
 from guidellm.benchmark.entrypoints import benchmark_with_scenario
 from guidellm.benchmark.scenario import GenerativeTextScenario, get_builtin_scenarios
 from guidellm.config import print_config
 from guidellm.preprocess.dataset import ShortPromptStrategy, process_dataset
 from guidellm.scheduler import StrategyType
+from guidellm.utils import DefaultGroupHandler
 from guidellm.utils import cli as cli_tools
 
 STRATEGY_PROFILE_CHOICES = set(
@@ -25,7 +29,17 @@ def cli():
     pass
 
 
-@cli.command(
+@cli.group(
+    help="Commands to run a new benchmark or load a prior one.",
+    cls=DefaultGroupHandler,
+    default="run",
+)
+def benchmark():
+    pass
+
+
+@benchmark.command(
+    "run",
     help="Run a benchmark against a generative model using the specified arguments.",
     context_settings={"auto_envvar_prefix": "GUIDELLM"},
 )
@@ -230,7 +244,7 @@ def cli():
     type=int,
     help="The random seed to use for benchmarking to ensure reproducibility.",
 )
-def benchmark(
+def run(
     scenario,
     target,
     backend_type,
@@ -306,6 +320,34 @@ def benchmark(
     )
 
 
+@benchmark.command(help="Load a saved benchmark report.")
+@click.argument(
+    "path",
+    type=click.Path(file_okay=True, dir_okay=False, exists=True),
+    default=Path.cwd() / "benchmarks.json",
+)
+@click.option(
+    "--output-path",
+    type=click.Path(file_okay=True, dir_okay=True, exists=False),
+    default=None,
+    is_flag=False,
+    flag_value=Path.cwd() / "benchmarks_reexported.json",
+    help=(
+        "Allows re-exporting the benchmarks to another format. "
+        "The path to save the output to. If it is a directory, "
+        "it will save benchmarks.json under it. "
+        "Otherwise, json, yaml, or csv files are supported for output types "
+        "which will be read from the extension for the file path. "
+        "This input is optional. If the output path flag is not provided, "
+        "the benchmarks will not be reexported. If the flag is present but "
+        "no value is specified, it will default to the current directory "
+        "with the file name `benchmarks_reexported.json`."
+    ),
+)
+def from_file(path, output_path):
+    reimport_benchmarks_report(path, output_path)
+
+
 def decode_escaped_str(_ctx, _param, value):
     """
     Click auto adds characters. For example, when using --pad-char "\n",
@@ -321,10 +363,11 @@ def decode_escaped_str(_ctx, _param, value):
 
 
 @cli.command(
+    short_help="Prints environment variable settings.",
     help=(
         "Print out the available configuration settings that can be set "
         "through environment variables."
-    )
+    ),
 )
 def config():
     print_config()

diff --git a/src/guidellm/benchmark/__init__.py b/src/guidellm/benchmark/__init__.py
@@ -12,7 +12,7 @@
     StatusBreakdown,
 )
 from .benchmarker import Benchmarker, BenchmarkerResult, GenerativeBenchmarker
-from .entrypoints import benchmark_generative_text
+from .entrypoints import benchmark_generative_text, reimport_benchmarks_report
 from .output import GenerativeBenchmarksConsole, GenerativeBenchmarksReport
 from .profile import (
     AsyncProfile,
@@ -63,4 +63,5 @@
     "ThroughputProfile",
     "benchmark_generative_text",
     "create_profile",
+    "reimport_benchmarks_report",
 ]
diff --git a/src/guidellm/benchmark/entrypoints.py b/src/guidellm/benchmark/entrypoints.py
@@ -133,13 +133,8 @@ async def benchmark_generative_text(
             )
 
     if output_console:
-        orig_enabled = console.enabled
-        console.enabled = True
         console.benchmarks = report.benchmarks
-        console.print_benchmarks_metadata()
-        console.print_benchmarks_info()
-        console.print_benchmarks_stats()
-        console.enabled = orig_enabled
+        console.print_full_report()
 
     if output_path:
         console.print_line("\nSaving benchmarks report...")
@@ -151,3 +146,20 @@ async def benchmark_generative_text(
     console.print_line("\nBenchmarking complete.")
 
     return report, saved_path
+
+
+def reimport_benchmarks_report(file: Path, output_path: Optional[Path]) -> None:
+    """
+    The command-line entry point for re-importing and displaying an
+    existing benchmarks report. Can also specify
+    Assumes the file provided exists.
+    """
+    console = GenerativeBenchmarksConsole(enabled=True)
+    report = GenerativeBenchmarksReport.load_file(file)
+    console.benchmarks = report.benchmarks
+    console.print_full_report()
+
+    if output_path:
+        console.print_line("\nSaving benchmarks report...")
+        saved_path = report.save_file(output_path)
+        console.print_line(f"Benchmarks report saved to {saved_path}")
diff --git a/src/guidellm/benchmark/output.py b/src/guidellm/benchmark/output.py
@@ -242,7 +242,10 @@ def _file_setup(
         if path_suffix in [".csv"]:
             return path, "csv"
 
-        raise ValueError(f"Unsupported file extension: {path_suffix} for {path}.")
+        raise ValueError(
+            f"Unsupported file extension: {path_suffix} for {path}; "
+            "expected json, yaml, or csv."
+        )
 
     @staticmethod
     def _benchmark_desc_headers_and_values(
@@ -944,3 +947,20 @@ def print_benchmarks_stats(self):
             title="Benchmarks Stats",
             sections=sections,
         )
+
+    def print_full_report(self):
+        """
+        Print out the benchmark statistics to the console.
+        Temporarily enables the console if it's disabled.
+
+        Format:
+        - Metadata
+        - Info
+        - Stats
+        """
+        orig_enabled = self.enabled
+        self.enabled = True
+        self.print_benchmarks_metadata()
+        self.print_benchmarks_info()
+        self.print_benchmarks_stats()
+        self.enabled = orig_enabled
diff --git a/src/guidellm/utils/__init__.py b/src/guidellm/utils/__init__.py
@@ -1,4 +1,5 @@
 from .colors import Colors
+from .default_group import DefaultGroupHandler
 from .hf_datasets import (
     SUPPORTED_TYPES,
     save_dataset_to_file,
@@ -20,6 +21,7 @@
 __all__ = [
     "SUPPORTED_TYPES",
     "Colors",
+    "DefaultGroupHandler",
     "EndlessTextCreator",
     "IntegerRangeSampler",
     "check_load_processor",