Torch-TensorRT for RTX#

Torch-TensorRT supports TensorRT for RTX, which builds on the proven performance of the NVIDIA TensorRT inference library, and simplifies the deployment of AI models on NVIDIA RTX GPUs across desktops, laptops, and workstations.

TensorRT for RTX is a drop-in replacement for NVIDIA TensorRT in applications targeting NVIDIA RTX GPUs (Turing or newer). It introduces a Just-In-Time (JIT) optimizer in the runtime that compiles improved inference engines directly on the end-user’s RTX-accelerated PC in under 30 seconds. This eliminates the need for lengthy pre-compilation steps and enables rapid engine generation, improved application portability, and cutting-edge inference performance.

Currently, Torch-TensorRT only supports TensorRT-RTX for experimental purposes; Torch-TensorRT by default uses standard TensorRT during the build and run. For detailed information about TensorRT-RTX itself, see the TensorRT-RTX documentation.

Precompiled Binaries#

Dependencies#

You need to have CUDA, PyTorch, and an NVIDIA RTX GPU (Turing or newer) to use Torch-TensorRT for RTX.

Installing Torch-TensorRT for RTX#

You can install the python package using

python -m pip install torch torch-tensorrt-rtx

Packages are uploaded for Linux on x86 and Windows.

Installing Nightly Builds#

Torch-TensorRT for RTX distributes nightlies targeting the PyTorch nightly. These can be installed from the PyTorch nightly package index (separated by CUDA version):

python -m pip install --pre torch torch_tensorrt_rtx --extra-index-url https://download.pytorch.org/whl/nightly/cu130

Import Test#

After installation, verify the import succeeds:

python -c "import torch_tensorrt; print(torch_tensorrt.__version__)"

Note

The Python import path is torch_tensorrt regardless of which flavor of TensorRT (tensorrt or tensorrt_rtx) is installed (the flavor is determined by the wheel name: torch-tensorrt vs. torch-tensorrt-rtx).

Example: RTX-Only Features#

The following minimal example compiles a toy convolutional model with dynamic input shapes and applies three TensorRT-RTX-only runtime knobs after compile, via mod.runtime_settings:

  • runtime_cache — path to an on-disk cache of JIT-compiled kernels. The cache is populated on first use and reloaded on subsequent runs, so repeated invocations of the same compiled module skip JIT compilation. The default is a per-user file under the system temp directory; set this to a persistent path (e.g. somewhere under your project) to share the cache across runs.

  • dynamic_shapes_kernel_specialization_strategy — controls how TensorRT-RTX specializes kernels for the current runtime shape. Accepts "lazy" (default; use a generic fallback kernel while a shape-specialized kernel compiles asynchronously), "eager" (block on the current shape until the specialized kernel is ready), or "none" (always use the generic kernel).

  • cuda_graph_strategy — whether TensorRT-RTX captures and replays the engine internally as a CUDA graph. Accepts "disabled" (default) or "whole_graph_capture" (capture the engine’s forward and replay on subsequent calls — lower per-call CPU overhead, fixed input shapes required).

All three fields are no-ops on standard-TensorRT builds; they only take effect when the torch_tensorrt_rtx wheel is installed.

import torch
import torch.nn as nn
import torch_tensorrt
from torch_tensorrt.runtime import RuntimeSettings


class ToyConv(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.relu = nn.ReLU()

    def forward(self, x):
        return self.relu(self.conv(x))


model = ToyConv().eval().cuda()

inputs = [
    torch_tensorrt.Input(
        min_shape=(1, 3, 224, 224),
        opt_shape=(4, 3, 224, 224),
        max_shape=(8, 3, 224, 224),
        dtype=torch.float32,
    )
]

compiled = torch_tensorrt.compile(
    model,
    ir="dynamo",
    inputs=inputs,
    enabled_precisions={torch.float32},
    use_python_runtime=True,
)

# RTX-only: persist JIT-compiled kernels across runs, eagerly specialize
# for the current input shape, and let TensorRT-RTX capture+replay the
# engine internally as a CUDA graph. Apply before first execute so the
# engine's IExecutionContext picks the settings up on its single, lazy
# create.
compiled.runtime_settings = RuntimeSettings(
    runtime_cache="/tmp/my_rtx_cache.bin",
    dynamic_shapes_kernel_specialization_strategy="eager",
    cuda_graph_strategy="whole_graph_capture",
)

out = compiled(torch.randn(4, 3, 224, 224).cuda())
print(out.shape)

Note

To both flip cuda_graph_strategy and wrap the module for outer torch.cuda.CUDAGraph capture in a single context manager, prefer with torch_tensorrt.runtime.enable_cudagraphs(compiled, cuda_graph_strategy="whole_graph_capture") as wrapped: — see Runtime Settings (TensorRT-RTX) for the full pattern.

For temporary (scoped) overrides, shared caches across multiple modules, combining a cuda-graph capture strategy with enable_cudagraphs, and other advanced patterns, see Runtime Settings (TensorRT-RTX).

Compiling From Source#

The standard build prerequisites (Bazel, CUDA, Python, PyTorch nightly) are unchanged for the TensorRT-RTX build — see Dependencies in the main installation guide for those. Only the RTX-specific deltas are listed below.

RTX-Specific Dependencies#

Download the TensorRT-RTX tarball from https://developer.nvidia.com/tensorrt-rtx. Torch-TensorRT currently uses TensorRT-RTX version 1.5.0.114.

Once downloaded:

On Linux, add the tarball lib directory to LD_LIBRARY_PATH:

# If TensorRT-RTX is extracted in /your_local_download_path/TensorRT-RTX-1.5.0.114
export LD_LIBRARY_PATH=/your_local_download_path/TensorRT-RTX-1.5.0.114/lib:$LD_LIBRARY_PATH

On Windows, add the tarball lib directory to the system PATH:

# If TensorRT-RTX is downloaded in C:\your_local_download_path\TensorRT-RTX-1.5.0.114
set PATH="%PATH%;C:\your_local_download_path\TensorRT-RTX-1.5.0.114\lib"
echo %PATH% | findstr TensorRT-RTX

Install TensorRT-RTX Wheel#

tensorrt_rtx wheel is published on PyPI. During torch_tensorrt_rtx wheel installation, it will automatically install the tensorrt_rtx wheel.

Build Torch-TensorRT with TensorRT-RTX#

Build Locally with TensorRT-RTX#

Before building, ensure you have completed all the prerequisite steps above, including:

  • Cloning the repository

  • Installing Python dependencies (setuptools, torch, pyyaml, numpy)

  • Setting CUDA_HOME environment variable

  • Installing the correct CUDA toolkit version

  • Installing Python development headers

  • Installing Bazel

Then build the wheel:

python setup.py clean
bazel clean --expunge
rm -rf build/*

Then build and install the wheel:

USE_TRT_RTX=true python -m pip wheel . --no-deps -w dist/

# Note: the wheel filename uses underscores, not hyphens, and contains 'rtx'.
python -m pip install dist/torch_tensorrt_rtx-*.whl

Troubleshooting#

Common Issues#

Missing distutils module

If you encounter ModuleNotFoundError: No module named 'distutils', install setuptools:

pip install setuptools

Missing CUDA_HOME environment variable

If you encounter OSError: CUDA_HOME environment variable is not set, set the CUDA_HOME path:

export CUDA_HOME=/usr/local/cuda

CUDA version mismatch

If you encounter errors about CUDA paths not existing (e.g., /usr/local/cuda-X.Y/ does not exist), ensure you have the correct CUDA version installed. Check the required version in MODULE.bazel. You may need to:

  1. Update your NVIDIA drivers

  2. Download and install the specific CUDA toolkit version required by MODULE.bazel

  3. Clean and rebuild after installing the correct version

PyTorch version mismatch

If you encounter an error like ERROR: No matching distribution found for torch<X.Y.Z,>=X.Y.Z.dev (for example, torch<2.11.0,>=2.10.0.dev), install a compatible PyTorch nightly. First check the exact version constraint in pyproject.toml, then install with that constraint:

# Example: if pyproject.toml requires torch>=2.12.0.dev,<2.13.0
# and MODULE.bazel specifies CUDA 13.0 (cu130):
pip install --pre "torch>=2.12.0.dev,<2.13.0" torchvision --index-url https://download.pytorch.org/whl/nightly/cu130

Replace the version constraint and CUDA version (cuXXX) according to your project’s requirements.

Missing Python development headers

If you encounter fatal error: Python.h: No such file or directory, install the Python development package:

# For Python 3.12 (adjust version based on your Python)
sudo apt install python3.12-dev

Verifying TensorRT-RTX Linkage#

If you encounter load or link errors, check that tensorrt_rtx is linked correctly. If not, clean the environment and rebuild.

Linux:

# Ensure only tensorrt_rtx is installed (no standard tensorrt wheels)
python -m pip list | grep tensorrt

# Check if libtorchtrt.so links to the correct tensorrt_rtx shared object
trt_install_path=$(python -m pip show torch-tensorrt | grep "Location" | awk '{print $2}')/torch_tensorrt

# Verify libtensorrt_rtx.so.1 is linked, and libnvinfer.so.10 is NOT
ldd $trt_install_path/lib/libtorchtrt.so

Windows:

# Check if tensorrt_rtx_1_0.dll is linked, and libnvinfer.dll is NOT
cd py/torch_tensorrt
dumpbin /DEPENDENTS torchtrt.dll