Table of Contents

Shortcuts

Here we provide examples of Torch-TensorRT compilation of popular computer vision and language models.

Dependencies¶

Please install the following external dependencies (assuming you already have correct torch, torch_tensorrt and tensorrt libraries installed (dependencies))

pip install -r requirements.txt

Model Zoo¶

Compiling ResNet with dynamic shapes using the torch.compile backend: Compiling a ResNet model using the Torch Compile Frontend for torch_tensorrt.compile
Compiling BERT using the torch.compile backend: Compiling a Transformer model using torch.compile
Compiling Stable Diffusion model using the torch.compile backend: Compiling a Stable Diffusion model using torch.compile
_torch_compile_gpt2: Compiling a GPT2 model using torch.compile
_torch_export_gpt2: Compiling a GPT2 model using AOT workflow (ir=dynamo)
_torch_export_llama2: Compiling a Llama2 model using AOT workflow (ir=dynamo)
_torch_export_sam2: Compiling SAM2 model using AOT workflow (ir=dynamo)
_torch_export_flux_dev: Compiling FLUX.1-dev model using AOT workflow (ir=dynamo)
Debugging Torch-TensorRT Compilation: Debugging Torch-TensorRT Compilation

Compiling Stable Diffusion model using the torch.compile backend

Compiling Stable Diffusion model using the torch.compile backend

Debugging Torch-TensorRT Compilation

Debugging Torch-TensorRT Compilation

sphx_glr_tutorials__rendered_examples_dynamo_cross_runtime_compilation_for_windows.py

cross runtime compilation limitations:

Refitting Torch-TensorRT Programs with New Weights

Refitting Torch-TensorRT Programs with New Weights

Compiling BERT using the torch.compile backend

Compiling BERT using the torch.compile backend

Compiling GPT2 using the Torch-TensorRT torch.compile frontend

Compiling GPT2 using the Torch-TensorRT torch.compile frontend

Torch Compile Advanced Usage

Torch Compile Advanced Usage

Engine Caching (BERT)

Engine Caching (BERT)

Torch Export with Cudagraphs

Torch Export with Cudagraphs

Pre-allocated output buffer

Pre-allocated output buffer

Compiling ResNet with dynamic shapes using the torch.compile backend

Compiling ResNet with dynamic shapes using the torch.compile backend

An example of using Torch-TensorRT Autocast

An example of using Torch-TensorRT Autocast

Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend

Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend

Automatically Generate a Converter for a Custom Kernel

Automatically Generate a Converter for a Custom Kernel

Automatically Generate a Plugin for a Custom Kernel

Automatically Generate a Plugin for a Custom Kernel

Low CPU Memory Compilation Example

Low CPU Memory Compilation Example

sphx_glr_tutorials__rendered_examples_dynamo_aot_plugin.py

Torch-TensorRT supports falling back to PyTorch implementations of operations in the case that Torch-TensorRT

Overloading Torch-TensorRT Converters with Custom Converters

Overloading Torch-TensorRT Converters with Custom Converters

Hierarchical Partitioner Example

Hierarchical Partitioner Example

Mutable Torch TensorRT Module

Mutable Torch TensorRT Module

Weight Streaming

Weight Streaming

Compiling SAM2 using the dynamo backend

Compiling SAM2 using the dynamo backend

Deploy Quantized Models using Torch-TensorRT

Deploy Quantized Models using Torch-TensorRT

Engine Caching

Using Custom Kernels with NVRTC in TensorRT AOT Plugins

Using Custom Kernels with NVRTC in TensorRT AOT Plugins

sphx_glr_tutorials__rendered_examples_dynamo_llama2_flashinfer_rmsnorm.py

.._llama2_flashinfer_rmsnorm:

Using Custom Kernels within TensorRT Engines with Torch-TensorRT

Using Custom Kernels within TensorRT Engines with Torch-TensorRT

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources