Torch-TensorRT for RTX#
Torch-TensorRT supports TensorRT for RTX, which builds on the proven performance of the NVIDIA TensorRT inference library, and simplifies the deployment of AI models on NVIDIA RTX GPUs across desktops, laptops, and workstations.
TensorRT for RTX is a drop-in replacement for NVIDIA TensorRT in applications targeting NVIDIA RTX GPUs (Turing or newer). It introduces a Just-In-Time (JIT) optimizer in the runtime that compiles improved inference engines directly on the end-user’s RTX-accelerated PC in under 30 seconds. This eliminates the need for lengthy pre-compilation steps and enables rapid engine generation, improved application portability, and cutting-edge inference performance.
Currently, Torch-TensorRT only supports TensorRT-RTX for experimental purposes; Torch-TensorRT by default uses standard TensorRT during the build and run. For detailed information about TensorRT-RTX itself, see the TensorRT-RTX documentation.
Precompiled Binaries#
Dependencies#
You need to have CUDA, PyTorch, and an NVIDIA RTX GPU (Turing or newer) to use Torch-TensorRT for RTX.
Installing Torch-TensorRT for RTX#
You can install the python package using
python -m pip install torch torch-tensorrt-rtx
Packages are uploaded for Linux on x86 and Windows.
Installing Nightly Builds#
Torch-TensorRT for RTX distributes nightlies targeting the PyTorch nightly. These can be installed from the PyTorch nightly package index (separated by CUDA version):
python -m pip install --pre torch torch_tensorrt_rtx --extra-index-url https://download.pytorch.org/whl/nightly/cu130
Import Test#
After installation, verify the import succeeds:
python -c "import torch_tensorrt; print(torch_tensorrt.__version__)"
Note
The Python import path is torch_tensorrt regardless of which flavor of
TensorRT (tensorrt or tensorrt_rtx) is installed (the flavor is determined
by the wheel name: torch-tensorrt vs. torch-tensorrt-rtx).
Example: RTX-Only Features#
The following minimal example compiles a toy convolutional model with dynamic
input shapes and applies three TensorRT-RTX-only runtime knobs after
compile, via mod.runtime_settings:
runtime_cache— path to an on-disk cache of JIT-compiled kernels. The cache is populated on first use and reloaded on subsequent runs, so repeated invocations of the same compiled module skip JIT compilation. The default is a per-user file under the system temp directory; set this to a persistent path (e.g. somewhere under your project) to share the cache across runs.dynamic_shapes_kernel_specialization_strategy— controls how TensorRT-RTX specializes kernels for the current runtime shape. Accepts"lazy"(default; use a generic fallback kernel while a shape-specialized kernel compiles asynchronously),"eager"(block on the current shape until the specialized kernel is ready), or"none"(always use the generic kernel).cuda_graph_strategy— whether TensorRT-RTX captures and replays the engine internally as a CUDA graph. Accepts"disabled"(default) or"whole_graph_capture"(capture the engine’s forward and replay on subsequent calls — lower per-call CPU overhead, fixed input shapes required).
All three fields are no-ops on standard-TensorRT builds; they only take
effect when the torch_tensorrt_rtx wheel is installed.
import torch
import torch.nn as nn
import torch_tensorrt
from torch_tensorrt.runtime import RuntimeSettings
class ToyConv(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
self.relu = nn.ReLU()
def forward(self, x):
return self.relu(self.conv(x))
model = ToyConv().eval().cuda()
inputs = [
torch_tensorrt.Input(
min_shape=(1, 3, 224, 224),
opt_shape=(4, 3, 224, 224),
max_shape=(8, 3, 224, 224),
dtype=torch.float32,
)
]
compiled = torch_tensorrt.compile(
model,
ir="dynamo",
inputs=inputs,
enabled_precisions={torch.float32},
use_python_runtime=True,
)
# RTX-only: persist JIT-compiled kernels across runs, eagerly specialize
# for the current input shape, and let TensorRT-RTX capture+replay the
# engine internally as a CUDA graph. Apply before first execute so the
# engine's IExecutionContext picks the settings up on its single, lazy
# create.
compiled.runtime_settings = RuntimeSettings(
runtime_cache="/tmp/my_rtx_cache.bin",
dynamic_shapes_kernel_specialization_strategy="eager",
cuda_graph_strategy="whole_graph_capture",
)
out = compiled(torch.randn(4, 3, 224, 224).cuda())
print(out.shape)
Note
To both flip cuda_graph_strategy and wrap the module for outer
torch.cuda.CUDAGraph capture in a single context manager, prefer
with torch_tensorrt.runtime.enable_cudagraphs(compiled, cuda_graph_strategy="whole_graph_capture") as wrapped:
— see Runtime Settings (TensorRT-RTX) for the full pattern.
For temporary (scoped) overrides, shared caches across multiple modules,
combining a cuda-graph capture strategy with enable_cudagraphs, and other
advanced patterns, see Runtime Settings (TensorRT-RTX).
Compiling From Source#
The standard build prerequisites (Bazel, CUDA, Python, PyTorch nightly) are unchanged for the TensorRT-RTX build — see Dependencies in the main installation guide for those. Only the RTX-specific deltas are listed below.
RTX-Specific Dependencies#
Download the TensorRT-RTX tarball from https://developer.nvidia.com/tensorrt-rtx. Torch-TensorRT currently uses TensorRT-RTX version 1.5.0.114.
Once downloaded:
On Linux, add the tarball lib directory to LD_LIBRARY_PATH:
# If TensorRT-RTX is extracted in /your_local_download_path/TensorRT-RTX-1.5.0.114
export LD_LIBRARY_PATH=/your_local_download_path/TensorRT-RTX-1.5.0.114/lib:$LD_LIBRARY_PATH
On Windows, add the tarball lib directory to the system PATH:
# If TensorRT-RTX is downloaded in C:\your_local_download_path\TensorRT-RTX-1.5.0.114
set PATH="%PATH%;C:\your_local_download_path\TensorRT-RTX-1.5.0.114\lib"
echo %PATH% | findstr TensorRT-RTX
Install TensorRT-RTX Wheel#
tensorrt_rtx wheel is published on PyPI. During torch_tensorrt_rtx wheel installation, it will automatically install the tensorrt_rtx wheel.
Build Torch-TensorRT with TensorRT-RTX#
Build Locally with TensorRT-RTX#
Before building, ensure you have completed all the prerequisite steps above, including:
Cloning the repository
Installing Python dependencies (setuptools, torch, pyyaml, numpy)
Setting CUDA_HOME environment variable
Installing the correct CUDA toolkit version
Installing Python development headers
Installing Bazel
Then build the wheel:
python setup.py clean
bazel clean --expunge
rm -rf build/*
Then build and install the wheel:
USE_TRT_RTX=true python -m pip wheel . --no-deps -w dist/
# Note: the wheel filename uses underscores, not hyphens, and contains 'rtx'.
python -m pip install dist/torch_tensorrt_rtx-*.whl
Troubleshooting#
Common Issues#
Missing distutils module
If you encounter ModuleNotFoundError: No module named 'distutils', install
setuptools:
pip install setuptools
Missing CUDA_HOME environment variable
If you encounter OSError: CUDA_HOME environment variable is not set, set
the CUDA_HOME path:
export CUDA_HOME=/usr/local/cuda
CUDA version mismatch
If you encounter errors about CUDA paths not existing (e.g.,
/usr/local/cuda-X.Y/ does not exist), ensure you have the correct CUDA
version installed. Check the required version in
MODULE.bazel.
You may need to:
Update your NVIDIA drivers
Download and install the specific CUDA toolkit version required by
MODULE.bazelClean and rebuild after installing the correct version
PyTorch version mismatch
If you encounter an error like
ERROR: No matching distribution found for torch<X.Y.Z,>=X.Y.Z.dev (for
example, torch<2.11.0,>=2.10.0.dev), install a compatible PyTorch nightly.
First check the exact version constraint in
pyproject.toml,
then install with that constraint:
# Example: if pyproject.toml requires torch>=2.12.0.dev,<2.13.0
# and MODULE.bazel specifies CUDA 13.0 (cu130):
pip install --pre "torch>=2.12.0.dev,<2.13.0" torchvision --index-url https://download.pytorch.org/whl/nightly/cu130
Replace the version constraint and CUDA version (cuXXX) according to your project’s requirements.
Missing Python development headers
If you encounter fatal error: Python.h: No such file or directory, install
the Python development package:
# For Python 3.12 (adjust version based on your Python)
sudo apt install python3.12-dev
Verifying TensorRT-RTX Linkage#
If you encounter load or link errors, check that tensorrt_rtx is linked
correctly. If not, clean the environment and rebuild.
Linux:
# Ensure only tensorrt_rtx is installed (no standard tensorrt wheels)
python -m pip list | grep tensorrt
# Check if libtorchtrt.so links to the correct tensorrt_rtx shared object
trt_install_path=$(python -m pip show torch-tensorrt | grep "Location" | awk '{print $2}')/torch_tensorrt
# Verify libtensorrt_rtx.so.1 is linked, and libnvinfer.so.10 is NOT
ldd $trt_install_path/lib/libtorchtrt.so
Windows:
# Check if tensorrt_rtx_1_0.dll is linked, and libnvinfer.dll is NOT
cd py/torch_tensorrt
dumpbin /DEPENDENTS torchtrt.dll