.. _converter_registry: Converter Registry Internals ============================== This page covers the internals of ``DYNAMO_CONVERTERS`` — the global registry that maps every ``torch.ops`` target to its TensorRT converter — and explains the supporting types and lookup algorithm. For a guide on *writing* converters see :ref:`dynamo_converters`. ---- Key Types --------- ConverterSupport ^^^^^^^^^^^^^^^^ Every converter registered via ``@dynamo_tensorrt_converter`` is stored as a ``ConverterSupport`` frozen dataclass: .. code-block:: python @dataclass(frozen=True) class ConverterSupport: converter_implementation: ConverterImplSignature capability_validator: Callable[[Node, CompilationSettings], bool] = lambda node, settings: True supports_dynamic_shapes: bool = False requires_output_allocator: bool = False ``converter_implementation`` The actual converter function. See :ref:`dynamo_converters` for the expected signature. ``capability_validator`` Called at *partition* time (before conversion) with the live ``torch.fx.Node`` and active ``CompilationSettings``. Must return ``True`` if this converter can handle this specific node. The default always returns ``True`` (unconditional support). See :ref:`partitioning` for how validators interact with the partitioner. ``supports_dynamic_shapes`` If ``False`` (default), the registry will not select this converter when the node has symbolic (dynamic) input dimensions — unless ``assume_dynamic_shape_support=True`` in settings. ``requires_output_allocator`` Marks converters whose TRT implementation produces data-dependent output shapes (e.g. ``nonzero``, ``unique``). The runtime will use TensorRT's output allocator for these ops rather than pre-allocating fixed buffers. ConverterPriority ^^^^^^^^^^^^^^^^^ .. code-block:: python class ConverterPriority(Enum): STANDARD = auto() HIGH = auto() ``HIGH`` priority converters are inserted at the **front** of the candidate list for their target. When the registry iterates candidates for a node it picks the first one whose ``capability_validator`` returns ``True`` — so ``HIGH`` converters are checked before ``STANDARD`` ones. Use ``HIGH`` to override a built-in converter: .. code-block:: python @dynamo_tensorrt_converter( torch.ops.aten.gelu.default, priority=ConverterPriority.HIGH, capability_validator=lambda node, settings: node.kwargs.get("approximate") == "tanh", ) def my_fast_gelu(ctx, target, args, kwargs, name): ... CallingConvention ^^^^^^^^^^^^^^^^^ .. code-block:: python class CallingConvention(Enum): LEGACY = auto() # Old-style FX converters: (net, target, args, kwargs, name) CTX = auto() # Dynamo converters: (ctx, target, args, kwargs, name) All newly written converters use ``CTX``. ``LEGACY`` is retained only for backward compatibility with old FX converter dictionaries. ---- ``@dynamo_tensorrt_converter`` Decorator ------------------------------------------ .. code-block:: python @dynamo_tensorrt_converter( key, *, enabled=True, capability_validator=None, priority=ConverterPriority.STANDARD, supports_dynamic_shapes=False, requires_output_allocator=False, ) ``key`` (``Target``) The ``torch.ops`` overload to register for. Must be an ``OpOverload`` (e.g. ``torch.ops.aten.relu.default``), not an ``OpOverloadPacket`` (e.g. ``torch.ops.aten.relu``), unless the packet has only one or two overloads (``"default"`` + ``"out"``). ``enabled`` (``bool``, default ``True``) If ``False``, the decorator is a no-op — the function is returned unchanged and **not** registered. Useful for gating experimental converters behind a flag: .. code-block:: python @dynamo_tensorrt_converter(torch.ops.mylib.op.default, enabled=EXPERIMENTAL) def my_converter(...): ... ``capability_validator`` (``Callable[[Node, CompilationSettings], bool]``, default ``None``) Validated at partition time. ``None`` is equivalent to ``lambda node, settings: True``. Must be a **pure function** that does not modify the node or graph. ``priority`` (``ConverterPriority``, default ``STANDARD``) Determines insertion position in the candidate list for this target. ``HIGH`` converters are tried first. ``supports_dynamic_shapes`` (``bool``, default ``False``) Set to ``True`` only after verifying the converter handles symbolic dimensions correctly (e.g., using ``ctx.net.add_shape`` rather than hardcoded sizes). ``requires_output_allocator`` (``bool``, default ``False``) Set to ``True`` for converters implementing data-dependent-shape ops. ---- The ``DYNAMO_CONVERTERS`` Registry Object ------------------------------------------ ``DYNAMO_CONVERTERS`` is the singleton ``ConverterRegistry`` instance the interpreter queries for every ``call_function`` node: .. code-block:: python from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS The registry wraps a list of converter dictionaries (currently one — ``DYNAMO_ATEN_CONVERTERS``) and provides these access patterns: ``DYNAMO_CONVERTERS[node]`` — validated lookup Pass a ``torch.fx.Node``. Returns ``(converter_impl, CallingConvention, flags_dict)`` where ``flags_dict`` has keys ``"supports_dynamic_shapes"`` and ``"requires_output_allocator"``. Raises ``KeyError`` if no validated converter is found. This is the path the interpreter uses. ``DYNAMO_CONVERTERS.get(node, default=None)`` Same as ``__getitem__`` but returns ``default`` instead of raising on a miss. ``target in DYNAMO_CONVERTERS`` Checks for a *validated* entry. Pass a ``Node`` for validated check or a ``Target`` for unvalidated existence check. Registry Inspection ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS # Print a full table of all registered targets and their source registries print(DYNAMO_CONVERTERS.display_all_available_converters()) # Get the table as a dict: {qualified_op_name: {registry_name: count}} support_info = DYNAMO_CONVERTERS.get_converter_support_info() # Check whether a specific op has a validated converter for a node: from torch_tensorrt.dynamo.partitioning import get_graph_converter_support n_supported, n_total = get_graph_converter_support(gm, torch_executed_ops=set()) print(f"{n_supported}/{n_total} ops have TRT converters") # List all unique registered targets: all_targets = DYNAMO_CONVERTERS.unique_targets() # Get all converters (including lower-priority ones) for a target: impls, registry_info = DYNAMO_CONVERTERS.get_all_converters_with_target( torch.ops.aten.relu.default, return_registry_info=True ) Lookup Algorithm ^^^^^^^^^^^^^^^^^ When the interpreter calls ``DYNAMO_CONVERTERS[node]``: 1. Check if ``node.target`` is in ``disallowed_targets`` (i.e., ``torch_executed_ops``). If yes, raise ``KeyError`` — node falls back to PyTorch. 2. Iterate registries in order (currently only ``DYNAMO_ATEN_CONVERTERS``). 3. For each registry containing the target, iterate its ``ConverterSupport`` list in order (HIGH-priority entries first). 4. For each candidate, evaluate: * ``capability_validator(node, compilation_settings)`` → must be ``True`` * If node has symbolic dims: ``assume_dynamic_shape_support`` must be ``True`` OR ``candidate.supports_dynamic_shapes`` must be ``True`` 5. Return the first passing candidate together with its ``CallingConvention`` and flags. 6. If no candidate passes, raise ``KeyError`` — node falls back to PyTorch. ---- ConversionContext ----------------- Every converter receives a ``ConversionContext`` as its first argument (``ctx``): .. code-block:: python @dataclass class ConversionContext: net: trt.INetworkDefinition # The TRT network being built compilation_settings: CompilationSettings requires_output_allocator: bool # Set True if any converter in the graph needs it weight_refit_map: dict[str, torch.Tensor] # name → weight for refit cpu_weights_reference_holder: list[torch.Tensor] ``ctx.net`` The ``trt.INetworkDefinition``. Converters call methods like ``ctx.net.add_elementwise()``, ``ctx.net.add_activation()``, etc. to add TRT layers. ``ctx.compilation_settings`` Full ``CompilationSettings`` object. Use this to read user preferences (e.g., check ``ctx.compilation_settings.enabled_precisions``) inside a converter. ``ctx.record_weight(name, weight)`` Call this from a converter whenever you add a constant tensor to the TRT network. It populates ``weight_refit_map`` (used by :func:`~torch_tensorrt.dynamo.refit_module_weights`) so the weight can be updated without recompilation. The name must match the TRT layer's weight name as it will appear in the engine. .. code-block:: python weight_tensor = get_trt_tensor(ctx, weight, f"{name}_weight") ctx.record_weight(f"{name}_weight", weight) # register for refit ``ctx.clear_cpu_weights_reference_holder()`` Called automatically after engine serialization. Releases the CPU-side references to weight tensors that were held alive during the build phase.