Plugin System#
Torch-TensorRT’s plugin system lets you run custom kernels inside a TensorRT engine, avoiding graph breaks and their associated overhead. There are three main approaches depending on your kernel language and performance requirements:
Approach |
Kernel language |
Execution |
Example |
|---|---|---|---|
QDP auto-generate (JIT) |
Triton |
JIT callback into Python at runtime |
auto_generate_plugins |
QDP auto-generate (AOT) |
Triton |
Pre-compiled PTX embedded in engine |
aot_plugin |
QDP auto-generate (AOT) |
CUDA C++ via NVRTC |
Pre-compiled PTX embedded in engine |
nvrtc_aot_plugin |
Manual (legacy) |
Triton / any |
JIT callback into Python at runtime |
custom_kernel_plugins |
The QDP (Quick Deployable Plugin) path (TensorRT ≥ 10.7) is the recommended
approach. It uses torch.library to register your custom op and
_generate_plugin_converter to automatically create the Torch-TensorRT converter.
The manual path requires writing both the TRT plugin and the converter by hand,
and is retained for compatibility with older workflows.
The flow for the QDP path is:
Register a custom op with
torch.libraryand implement it as a TRT QDP plugin.Call
_generate_plugin_converterto automatically create a Torch-TensorRT converter that bridges the two.Use the custom op in a PyTorch model and compile normally with
torch_tensorrt.dynamo.compile.
Prerequisites#
TensorRT ≥ 10.7 (
tensorrt.pluginmodule must be importable).A registered QDP plugin in
tensorrt.plugin’sQDP_REGISTRY.A corresponding
torch.opscustom op.
Registering a Plugin Converter#
_generate_plugin_converter creates and registers a converter for your custom op
automatically:
from torch_tensorrt.dynamo.conversion.plugins import _generate_plugin_converter
_generate_plugin_converter(
namespace="mylib",
op_name="my_custom_op",
overload=None, # None → "default" overload
supports_dynamic_shapes=True,
use_aot_if_available=True, # prefer AOT plugin if registered
)
This registers a converter for torch.ops.mylib.my_custom_op.default in
DYNAMO_CONVERTERS. The generated converter:
Looks up the QDP plugin object via
trtp.op.<namespace>.<op_name>.Converts all tensor inputs to
trt.ITensorusingget_trt_tensor.Passes non-tensor arguments (scalars, booleans, etc.) as plugin attributes, preserving the order from the op’s Torch schema.
Adds the plugin layer to
ctx.netand returns its output ITensors.
Parameters#
namespace/op_nameThe Torch Library namespace and operator name. The plugin must be registered in TRT’s registry as
{namespace}::{op_name}.overloadThe overload string (e.g.,
"Tensor") orNonefor thedefaultoverload.capability_validatorOptional
(Node, CompilationSettings) -> boolfunction. Same semantics as the standard@dynamo_tensorrt_converterdecorator.priorityConverterPriority.STANDARDorHIGH. UseHIGHto override an existing converter.supports_dynamic_shapesSet
Trueif the QDP plugin supports symbolic input dimensions.requires_output_allocatorSet
Trueif the plugin produces data-dependent output shapes.use_aot_if_availableIf
True(default), use the plugin’s ahead-of-time (AOT) compiled implementation when one is registered (desc.aot_impl_func is not None). Falls back to JIT plugin if the AOT impl is absent.
For complete end-to-end examples see:
auto_generate_plugins — Triton kernel, QDP JIT plugin
aot_plugin — Triton kernel, QDP AOT plugin (pre-compiled PTX, no Python overhead at runtime)
nvrtc_aot_plugin — CUDA C++ kernel compiled with NVRTC, QDP AOT plugin
custom_kernel_plugins — manual plugin + converter registration (legacy approach)
Debugging Plugin Converters#
If the converter is not being selected (op falls back to PyTorch):
Verify the plugin is in the QDP registry:
import tensorrt.plugin as trtp from tensorrt.plugin._lib import QDP_REGISTRY print("mylib::scaled_add" in QDP_REGISTRY)
Verify the converter was registered:
from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS print(torch.ops.mylib.scaled_add.default in DYNAMO_CONVERTERS)
Check the capability validator (if you supplied one) against the actual node in a dryrun report (see Dryrun Mode).