Skip to main content

Ctrl+K

Torch-TensorRT

Installation
User Guide
Advanced Usage
Model Zoo
API Reference

GitHub

Installation
User Guide
Advanced Usage
Model Zoo
API Reference
Debugging
Contributing
Legacy Frontends

GitHub

Section Navigation

HuggingFace Models
Extensibility
Resource & Memory Management
Compilation & Graph Analysis
Weight Refitting & LoRA
- Refitting TensorRT Engines with Updated Weights
- Example: Refitting Programs with New Weights
Runtime Optimization
Deployment
Complex Numerics
- Complex Tensor Support
- 3D Rotary Position Embedding (RoPE) + Attention compiled with Torch-TensorRT
Example: Distributed Inference
Operators Supported

Advanced Usage
Extensibility
Plugins

Plugins#

Register custom CUDA and Triton kernels as TensorRT plugins — from auto-generated Python plugins to AOT-compiled C++ plugins for use in serialized engines.

Plugin System
Example: Auto-generate a Plugin for a Custom Kernel
Example: Using Custom Kernels within TensorRT Engines
Automatically Generate a TensorRT AOT Plugin
Step 1: Define the Triton Kernel
Step 2: Register the PyTorch op
Step 3: Register the QDP Shape Descriptor
Step 4: Register the AOT Implementation
Step 5: Generate the Converter
Step 6: Compile and Run
Example: Custom Kernels with NVRTC in TensorRT AOT Plugins
Example: Auto-derived CUDA Kernel Plugins via cuda_kernel_op
Example: Pre-compiled PTX Kernels via ptx_op

previous

SubgraphBuilder — Cursor-Based FX Node Insertion

next

Plugin System

© Copyright 2024, NVIDIA Corporation.