Compiling Exported Programs with Torch-TensorRT#

Pytorch 2.1 introduced torch.export APIs which can export graphs from Pytorch programs into ExportedProgram objects. Torch-TensorRT dynamo frontend compiles these ExportedProgram objects and optimizes them using TensorRT. Here’s a simple usage of the dynamo frontend

import torch
import torch_tensorrt

model = MyModel().eval().cuda()
inputs = [torch.randn((1, 3, 224, 224), dtype=torch.float32).cuda()]
exp_program = torch.export.export(model, tuple(inputs))
trt_gm = torch_tensorrt.dynamo.compile(exp_program, inputs) # Output is a torch.fx.GraphModule
trt_gm(*inputs)

Note

torch_tensorrt.dynamo.compile is the main API for users to interact with Torch-TensorRT dynamo frontend. The input type of the model should be ExportedProgram (ideally the output of torch.export.export or torch_tensorrt.dynamo.trace (discussed in the section below)) and output type is a torch.fx.GraphModule object.

Customizable Settings#

There are lot of options for users to customize their settings for optimizing with TensorRT. Some of the frequently used options are as follows:

inputs - For static shapes, this can be a list of torch tensors or torch_tensorrt.Input objects. For dynamic shapes, this should be a list of torch_tensorrt.Input objects.
enable_autocast - For mixed precision, use enable_autocast=True with autocast_low_precision_type. Strong typing (respecting model/input dtypes) is always enabled.
truncate_long_and_double - Truncates long and double values to int and floats respectively.
torch_executed_ops - Operators which are forced to be executed by Torch.
min_block_size - Minimum number of consecutive operators required to be executed as a TensorRT segment.

The complete list of options can be found here

Under the hood#

Under the hood, torch_tensorrt.dynamo.compile performs the following on the graph.

Lowering - Applies lowering passes to add/remove operators for optimal conversion.
Partitioning - Partitions the graph into Pytorch and TensorRT segments based on the min_block_size and torch_executed_ops field.
Conversion - Pytorch ops get converted into TensorRT ops in this phase.
Optimization - Post conversion, we build the TensorRT engine and embed this inside the pytorch graph.

Tracing#

torch_tensorrt.dynamo.trace can be used to trace a Pytorch graphs and produce ExportedProgram. This internally performs some decompositions of operators for downstream optimization. The ExportedProgram can then be used with torch_tensorrt.dynamo.compile API. If you have dynamic input shapes in your model, you can use this torch_tensorrt.dynamo.trace to export the model with dynamic shapes. Alternatively, you can use torch.export with constraints directly as well.

import torch
import torch_tensorrt

inputs = [torch_tensorrt.Input(min_shape=(1, 3, 224, 224),
                              opt_shape=(4, 3, 224, 224),
                              max_shape=(8, 3, 224, 224),
                              dtype=torch.float32)]
model = MyModel().eval()
exp_program = torch_tensorrt.dynamo.trace(model, inputs)