Skip to main content
Ctrl+K

Torch-TensorRT

  • Installation
  • User Guide
  • Advanced Usage
  • Model Zoo
  • API Reference
    • Debugging
    • Contributing
    • Legacy Frontends
  • GitHub
  • Installation
  • User Guide
  • Advanced Usage
  • Model Zoo
  • API Reference
  • Debugging
  • Contributing
  • Legacy Frontends
  • GitHub

Section Navigation

  • Torch-TensorRT Explained
  • Compilation
    • TensorRT Backend for torch.compile
    • Example: Torch Compile Advanced Usage
    • Compiling Exported Programs with Torch-TensorRT
    • CompilationSettings Reference
    • Dynamic shapes with Torch-TensorRT
    • Example: Compiling Models with Dynamic Input Shapes
    • Handling Unsupported Operators
  • Precision & Quantization
    • Compile Mixed Precision models with Torch-TensorRT
    • An example of using Torch-TensorRT Autocast
    • Quantization (INT8 / FP8 / FP4)
    • Deploy Quantized Models using Torch-TensorRT
  • Runtime & Serialization
    • Deploying Torch-TensorRT Programs
    • Runtime API
    • DLA
    • Saving models compiled with Torch-TensorRT
    • Extracting a Raw TensorRT Engine
    • AOTInductor Deployment
    • MutableTorchTensorRTModule
    • Example: Saving and Loading Models with Dynamic Shapes
    • Example: Saving Models with Dynamic Shapes - Both Methods
  • Performance Tuning Guide
  • User Guide
  • Precision...

Precision & Quantization#

Control numerical precision with FP16, BF16, and mixed-precision autocast, and reduce model size with INT8/FP8/FP4 quantization via ModelOpt.

  • Compile Mixed Precision models with Torch-TensorRT
  • An example of using Torch-TensorRT Autocast
  • Quantization (INT8 / FP8 / FP4)
  • Deploy Quantized Models using Torch-TensorRT

previous

Handling Unsupported Operators

next

Compile Mixed Precision models with Torch-TensorRT

Edit on GitHub
Show Source

© Copyright 2024, NVIDIA Corporation.