Skip to main content

Ctrl+K

Torch-TensorRT

Installation
User Guide
Advanced Usage
Model Zoo
API Reference

GitHub

Installation
User Guide
Advanced Usage
Model Zoo
API Reference
Debugging
Contributing
Legacy Frontends

GitHub

Section Navigation

Torch-TensorRT Explained
Compilation
Precision & Quantization
Runtime & Serialization
Performance Tuning Guide

User Guide
Precision...

Precision & Quantization#

Control numerical precision with FP16, BF16, and mixed-precision autocast, and reduce model size with INT8/FP8/FP4 quantization via ModelOpt.

Compile Mixed Precision models with Torch-TensorRT
An example of using Torch-TensorRT Autocast
Quantization (INT8 / FP8 / FP4)
Deploy Quantized Models using Torch-TensorRT

previous

Handling Unsupported Operators

next

Compile Mixed Precision models with Torch-TensorRT

© Copyright 2024, NVIDIA Corporation.