-
Couldn't load subscription status.
- Fork 368
How To Benchmark Torch‐TensorRT with TorchBench
We have added support for benchmarking Torch-TRT across IRs (torchscript, torch_compile, dynamo) in TorchBench, which features a set of key models, and the extensibility to add easily to those.
First, it is key to set up a clean environment for benchmarking. We have two recommended ways to accomplish this.
- Set up a container based on the provided TorchBench Dockerfiles, then install
torch_tensorrtin it. - Set up a container based on the Torch-TRT Docker, then install torchbench in it.
With the environment set up, benchmarking Torch-TRT in TorchBench can be done in the following ways (from the root of the TorchBench clone).
# Prints metrics to stdout
python run.py {MODEL selected from TorchBench set} -d cuda -t eval --backend torch_trt --precision [fp32 OR fp16] [Torch-TRT specific options, see below]
# Saves metrics to .userbenchmark/torch_trt/metrics-*.json
python run_benchmark.py torch_trt --model {MODEL selected from TorchBench set} --precision [fp32 OR fp16] [Torch-TRT specific options, see below]--truncate_long_and_double: Whether to automatically truncate long and double operations
--min_block_size: Minimum number of operations in an accelerated TRT block
--workspace_size: Size of workspace allotted to TensorRT
--ir: Which internal representation to use: {"ts", "torch_compile", "dynamo", ...}
# Benchmarks ResNet18 with Torch-TRT, using FP32 precision, truncate_long_and_double=True, and compiling via the TorchScript path
python run.py resnet18 -d cuda -t eval --backend torch_trt --precision fp32 --truncate_long_and_double --ir torchscript# Benchmarks VGG16 with Torch-TRT, using FP16 precision, Batch Size 32, and compiling via the dynamo path
python run.py vgg16 -d cuda -t eval --backend torch_trt --precision fp16 --ir dynamo --bs 32# Benchmarks BERT with Torch-TRT, using FP16 precision, truncate_long_and_double=True, and compiling via the torch compile path
python run.py BERT_pytorch -d cuda -t eval --backend torch_trt --precision fp16 --truncate_long_and_double --ir torch_compileIn both of the cases below, the metrics will be saved to files at the path .userbenchmark/torch_trt/metrics-*.json, as per the TorchBench
# Benchmarks ResNet18 with Torch-TRT, using FP32 precision, truncate_long_and_double=True, and compiling via the TorchScript path
python run_benchmark.py torch_trt --model resnet18 --precision fp32 --truncate_long_and_double --ir torchscript# Benchmarks VGG16 with Torch-TRT, using FP16 precision, Batch Size 32, and compiling via the dynamo path
python run_benchmark.py torch_trt --model vgg16--precision fp16 --ir dynamo --bs 32# Benchmarks BERT with Torch-TRT, using FP16 precision, truncate_long_and_double=True, and compiling via the torch compile path
python run_benchmark.py torch_trt --model BERT_pytorch --precision fp16 --truncate_long_and_double --ir torch_compileIn the future, we hope to enable:
# Benchmarks all TorchBench models with Torch-TRT, compiling via the torch compile path
python run_benchmark.py torch_trt --precision fp16 --ir torch_compileCurrently, this is still in development, and the recommended method to benchmark multiple models is to make a bash script which iterates over the set of desired models and runs the individual benchmarks of those. See the discussion here, for more details.