.. _trace: Tracing Models with ``torch_tensorrt.dynamo.trace`` ===================================================== :func:`torch_tensorrt.dynamo.trace` is a thin wrapper around ``torch.export.export`` that automatically injects Torch-TensorRT's operator decompositions at export time, producing an ``ExportedProgram`` that is better suited for TRT compilation than a vanilla export. ---- Why Use ``trace`` Instead of ``torch.export.export`` Directly? --------------------------------------------------------------- Both paths produce a ``torch.export.ExportedProgram``. The difference is in the decompositions applied: * **``torch.export.export``** — applies a default set of decompositions, some of which produce composite ops (e.g. ``aten.linear``, ``aten.embedding``) that Torch-TensorRT must decompose again in its own lowering pass. * **``torch_tensorrt.dynamo.trace``** — applies Torch-TensorRT's curated ATen decomposition set upfront, producing a graph closer to the Core ATen opset that the converter library expects. This can reduce lowering time and increase TRT coverage for certain model architectures. For most models the difference is minor. Use ``trace`` when: * You are already using ``torch_tensorrt`` at the export step and want a single API. * Your model has ops that Torch-TensorRT's lowering pass handles better when decomposed at export time. * You want the tracing step to validate Torch-TensorRT compatibility before compilation. ---- Basic Usage ----------- .. code-block:: python import torch import torch_tensorrt model = MyModel().eval().cuda() inputs = [torch.randn(1, 3, 224, 224).cuda()] exp_program = torch_tensorrt.dynamo.trace(model, arg_inputs=inputs) trt_gm = torch_tensorrt.dynamo.compile(exp_program, arg_inputs=inputs) ---- Dynamic Shapes -------------- Pass ``torch_tensorrt.Input`` objects to specify dynamic dimensions. ``trace`` extracts ``torch.export.Dim`` constraints from the ``min/opt/max`` shapes and passes them to ``torch.export.export`` automatically: .. code-block:: python from torch_tensorrt import Input dyn_inputs = [ Input( min_shape=(1, 3, 224, 224), opt_shape=(8, 3, 224, 224), max_shape=(16, 3, 224, 224), dtype=torch.float32, name="x", ) ] exp_program = torch_tensorrt.dynamo.trace(model, arg_inputs=dyn_inputs) trt_gm = torch_tensorrt.dynamo.compile(exp_program, arg_inputs=dyn_inputs) Under the hood, ``trace`` calls ``get_dynamic_shapes_args`` / ``get_dynamic_shapes_kwargs`` to build the ``dynamic_shapes`` dict required by ``torch.export.export``, using the ``Input``'s min/max range to construct a ``torch.export.Dim`` for each dynamic axis. Axes where min == max are treated as static. ---- Keyword Inputs -------------- .. code-block:: python # Model with a kwarg: def forward(self, x, *, mask=None) exp_program = torch_tensorrt.dynamo.trace( model, arg_inputs=(torch.randn(1, 512).cuda(),), kwarg_inputs={"mask": torch.ones(1, 512, dtype=torch.bool).cuda()}, ) Both ``arg_inputs`` and ``kwarg_inputs`` accept ``torch.Tensor`` (static) or ``torch_tensorrt.Input`` (dynamic) values. ---- Non-Strict Tracing ------------------ ``trace`` uses ``strict=False`` by default when calling ``torch.export.export``. Non-strict mode allows data-dependent control flow and Python-side tensor operations that strict export would reject. If you need strict export semantics, pass ``strict=True``: .. code-block:: python exp_program = torch_tensorrt.dynamo.trace(model, arg_inputs=inputs, strict=True) ---- ``trace`` vs ``compile`` Entry Points --------------------------------------- .. list-table:: :widths: 30 35 35 :header-rows: 1 * - - ``torch_tensorrt.dynamo.trace`` then ``compile`` - ``torch_tensorrt.dynamo.compile`` directly (``nn.Module`` input) * - ExportedProgram reuse - Yes — export once, compile multiple times with different settings - No — re-exports on every compile call * - Decompositions at export - Torch-TRT curated set - Applied during compile's lowering pass * - Inspect pre-compilation graph - Yes — inspect ``exp_program.graph_module`` - No — graph only visible inside compile * - ``torch.compile`` JIT path - N/A - Supported via ``backend="tensorrt"``