Skip to main content
Ctrl+K

Torch-TensorRT

  • Installation
  • User Guide
  • Advanced Usage
  • Model Zoo
  • API Reference
    • Debugging
    • Contributing
    • Legacy Frontends
  • GitHub
  • Installation
  • User Guide
  • Advanced Usage
  • Model Zoo
  • API Reference
  • Debugging
  • Contributing
  • Legacy Frontends
  • GitHub

Section Navigation

  • Torch-TensorRT Explained
  • Compilation
    • TensorRT Backend for torch.compile
    • Example: Torch Compile Advanced Usage
    • Compiling Exported Programs with Torch-TensorRT
    • CompilationSettings Reference
    • Dynamic shapes with Torch-TensorRT
    • Example: Compiling Models with Dynamic Input Shapes
    • Handling Unsupported Operators
  • Precision & Quantization
    • Compile Mixed Precision models with Torch-TensorRT
    • An example of using Torch-TensorRT Autocast
    • Quantization (INT8 / FP8 / FP4)
    • Deploy Quantized Models using Torch-TensorRT
  • Runtime & Serialization
    • Deploying Torch-TensorRT Programs
    • Runtime API
    • DLA
    • Saving models compiled with Torch-TensorRT
    • Extracting a Raw TensorRT Engine
    • AOTInductor Deployment
    • MutableTorchTensorRTModule
    • Example: Saving and Loading Models with Dynamic Shapes
    • Example: Saving Models with Dynamic Shapes - Both Methods
  • Performance Tuning Guide
  • User Guide

User Guide#

Conceptual guides and how-tos for Torch-TensorRT.

  • Torch-TensorRT Explained
    • Dynamo Frontend
    • Legacy Frontends
    • torch_tensorrt.compile
  • Compilation
    • TensorRT Backend for torch.compile
    • Example: Torch Compile Advanced Usage
    • Compiling Exported Programs with Torch-TensorRT
    • CompilationSettings Reference
    • Dynamic shapes with Torch-TensorRT
    • Example: Compiling Models with Dynamic Input Shapes
    • Handling Unsupported Operators
  • Precision & Quantization
    • Compile Mixed Precision models with Torch-TensorRT
    • An example of using Torch-TensorRT Autocast
    • Quantization (INT8 / FP8 / FP4)
    • Deploy Quantized Models using Torch-TensorRT
  • Runtime & Serialization
    • Deploying Torch-TensorRT Programs
    • Runtime API
    • DLA
    • Saving models compiled with Torch-TensorRT
    • Extracting a Raw TensorRT Engine
    • AOTInductor Deployment
    • MutableTorchTensorRTModule
    • Example: Saving and Loading Models with Dynamic Shapes
    • Example: Saving Models with Dynamic Shapes - Both Methods
  • Performance Tuning Guide
    • Common Benchmarking Issues
    • Using the Right Precision
    • Tuning opt_shape
    • Optimization Level
    • TRT Coverage and Graph Breaks
    • CUDA Graphs
    • Engine Caching
    • Memory and Throughput Tradeoffs
    • Profiling with Nsight
    • Benchmarking Checklist

previous

Torch-TensorRT-RTX

next

Torch-TensorRT Explained

Edit on GitHub
Show Source

© Copyright 2024, NVIDIA Corporation.