Skip to main content

Ctrl+K

Torch-TensorRT

Installation
User Guide
Advanced Usage
Model Zoo
API Reference

GitHub

Installation
User Guide
Advanced Usage
Model Zoo
API Reference
Debugging
Contributing
Legacy Frontends

GitHub

Section Navigation

HuggingFace Models
Extensibility
Resource & Memory Management
Compilation & Graph Analysis
Weight Refitting & LoRA
- Refitting TensorRT Engines with Updated Weights
- Example: Refitting Programs with New Weights
Runtime Optimization
Deployment
Complex Numerics
- Complex Tensor Support
- 3D Rotary Position Embedding (RoPE) + Attention compiled with Torch-TensorRT
Example: Distributed Inference
Operators Supported

Advanced Usage
Resource...

Resource & Memory Management#

Control GPU/CPU memory consumption during compilation and inference: weight streaming, dynamic memory allocation, and low-CPU-memory compilation.

Resource Management
Engine Caching
Example: Engine Caching
Example: Engine Caching (BERT)
Example: Weight Streaming
Example: Dynamic Memory Allocation
Example: Low CPU Memory Compilation

previous

Pre-compiled PTX kernels via torch_tensorrt.kernels.ptx_op

next

Resource Management

© Copyright 2024, NVIDIA Corporation.