User Guide# Conceptual guides and how-tos for Torch-TensorRT. Torch-TensorRT Explained Dynamo Frontend Legacy Frontends torch_tensorrt.compile Compilation TensorRT Backend for torch.compile Example: Torch Compile Advanced Usage Compiling Exported Programs with Torch-TensorRT CompilationSettings Reference Dynamic shapes with Torch-TensorRT Example: Compiling Models with Dynamic Input Shapes Handling Unsupported Operators Precision & Quantization Compile Mixed Precision models with Torch-TensorRT An example of using Torch-TensorRT Autocast Quantization (INT8 / FP8 / FP4) Deploy Quantized Models using Torch-TensorRT Runtime & Serialization Deploying Torch-TensorRT Programs Runtime API DLA Saving models compiled with Torch-TensorRT Extracting a Raw TensorRT Engine AOTInductor Deployment MutableTorchTensorRTModule Example: Saving and Loading Models with Dynamic Shapes Example: Saving Models with Dynamic Shapes - Both Methods Performance Tuning Guide Common Benchmarking Issues Using the Right Precision Tuning opt_shape Optimization Level TRT Coverage and Graph Breaks CUDA Graphs Engine Caching Memory and Throughput Tradeoffs Profiling with Nsight Benchmarking Checklist