.. _aot_inductor: AOTInductor Deployment ====================== **AOTInductor** (Ahead-of-Time Inductor) compiles a PyTorch model into a self-contained ``.pt2`` package at build time. That package can be loaded and executed in Python or C++ without a Torch-TensorRT dependency at runtime. Torch-TensorRT integrates with AOTInductor: TRT-convertible subgraphs become TRT engines embedded in the package; the remaining ops (PyTorch fallback subgraphs) are compiled by AOTInductor into native CUDA kernels. The result is a single ``.pt2`` file that runs end-to-end without Python. **When to use AOTInductor** * Deploying to C++ servers without a Python environment. * Shipping a single self-contained artifact that bundles both TRT engines and PyTorch ops. * When you want inference-time independence from Torch-TensorRT. .. note:: AOTInductor packaging is currently **Linux-only**. ---- Compile and Save ----------------- The workflow is identical to the standard ``ir="dynamo"`` path, with two extra arguments to ``torch_tensorrt.save``: * ``output_format="aot_inductor"`` — selects the ``.pt2`` packager. * ``retrace=True`` — re-exports the compiled graph through ``torch.export`` before passing it to ``torch._inductor.aoti_compile_and_package``. Required when the compiled module contains TRT engine subgraphs. .. code-block:: python import torch import torch_tensorrt model = MyModel().eval().cuda() example_inputs = (torch.randn(8, 10, device="cuda"),) # Step 1 — export with optional dynamic shapes batch_dim = torch.export.Dim("batch", min=1, max=1024) exported = torch.export.export( model, example_inputs, dynamic_shapes={"x": {0: batch_dim}} ) # Step 2 — compile with Torch-TensorRT trt_gm = torch_tensorrt.dynamo.compile( exported, inputs=[ torch_tensorrt.Input( min_shape=(1, 10), opt_shape=(8, 10), max_shape=(1024, 10), dtype=torch.float32, ) ], use_explicit_typing=True, min_block_size=1, ) # Step 3 — package into a .pt2 file torch_tensorrt.save( trt_gm, "model.pt2", output_format="aot_inductor", retrace=True, arg_inputs=example_inputs, ) The ``.pt2`` file embeds both the TRT engine(s) and AOTInductor-compiled kernels for any ops that fell back to PyTorch. ---- Python Inference ----------------- Load the package with ``torch._inductor.aoti_load_package``. No Torch-TensorRT import is needed at inference time: .. code-block:: python import torch model = torch._inductor.aoti_load_package("model.pt2") # Works with any batch size within the compiled range output = model(torch.randn(4, 10, device="cuda")) output = model(torch.randn(16, 10, device="cuda")) ---- C++ Inference -------------- The same ``.pt2`` package runs in C++ via ``AOTIModelPackageLoader``, with no Python or Torch-TensorRT dependency: .. code-block:: cpp #include "torch/torch.h" #include "torch/csrc/inductor/aoti_package/model_package_loader.h" int main() { c10::InferenceMode mode; torch::inductor::AOTIModelPackageLoader loader("model.pt2"); // Batch size 8 std::vector inputs = {torch::randn({8, 10}, at::kCUDA)}; auto outputs = loader.run(inputs); // Dynamic batch — different size works within the compiled min/max range outputs = loader.run({torch::randn({1, 10}, at::kCUDA)}); return 0; } .. note:: At runtime, ``libtorchtrt_runtime.so`` is not needed. Ensure your link flags exclude it (or use ``--as-needed``) to avoid a spurious dependency. ---- Comparison: PT2 vs ExportedProgram ------------------------------------ .. list-table:: :widths: 30 35 35 :header-rows: 1 * - Feature - ``.ep`` (ExportedProgram) - ``.pt2`` (AOTInductor) * - Python load - ``torch_tensorrt.load("trt.ep").module()`` - ``torch._inductor.aoti_load_package("trt.pt2")`` * - C++ load - Not supported - ``AOTIModelPackageLoader`` * - Torch-TensorRT at runtime - Required - Not required * - Non-TRT ops - Run via PyTorch eager - Compiled by AOTInductor (native CUDA) * - Platform - Linux + Windows - Linux only ---- See ``examples/torchtrt_aoti_example/`` for a complete end-to-end runnable example (``model.py`` for compilation, ``inference.py`` for loading).