.. _dryrun:

Dryrun Mode
===========

Dryrun mode runs the full Torch-TensorRT partitioning pipeline — lowering, capability
checking, graph splitting — but stops **before** building any TRT engines. It prints a
detailed report describing exactly how the graph will be partitioned and which operators
will run in TRT vs. PyTorch.

Use dryrun to:

* Understand TRT operator coverage for a new model without waiting for compilation.
* Tune ``min_block_size`` before committing to a full compile.
* Debug why an op is falling back to PyTorch.
* Compare the partitioning effect of different ``CompilationSettings``.

----

Enabling Dryrun
---------------

Set ``dryrun=True`` to print the report to stdout:

.. code-block:: python

    import torch
    import torch_tensorrt

    exp_program = torch.export.export(model, tuple(inputs))
    torch_tensorrt.dynamo.compile(
        exp_program,
        arg_inputs=inputs,
        dryrun=True,
    )

Set ``dryrun`` to a file path to also save the report:

.. code-block:: python

    torch_tensorrt.dynamo.compile(
        exp_program,
        arg_inputs=inputs,
        dryrun="/tmp/partition_report.txt",
    )

``dryrun`` is a :class:`~torch_tensorrt.dynamo.CompilationSettings` parameter and can
also be passed via ``torch.compile``:

.. code-block:: python

    trt_model = torch.compile(model, backend="tensorrt", options={"dryrun": True})
    trt_model(*inputs)  # report printed on first forward pass

----

Reading the Report
------------------

A typical dryrun report looks like this::

    ++++++++++++++++++++++++++ Dry-Run Results for Graph ++++++++++++++++++++++++++

    The graph consists of 142 Total Operators, of which 138 operators are supported, 97.18% coverage

    The following ops are currently unsupported or excluded from conversion, and are listed with their op-count in the graph:
     torch.ops.aten.embedding.default: 1
     torch.ops.aten.index.Tensor: 3

    Compiled with: CompilationSettings(enabled_precisions={dtype.f32}, min_block_size=5, ...)

      Graph Structure:

       Inputs: List[Tensor: (1, 512)@int64]
        ...
          TRT Engine #1 - Submodule name: _run_on_acc_0
           Engine Inputs: List[Tensor: (1, 512, 768)@float32]
           Number of Operators in Engine: 135
           Engine Outputs: Tensor: (1, 512, 30522)@float32
        ...
       Outputs: List[Tensor: (1, 512, 30522)@float32]

      ------------------------- Aggregate Stats -------------------------

       Average Number of Operators per TRT Engine: 135.0
       Most Operators in a TRT Engine: 135

       ********** Recommendations **********

       - For minimal graph segmentation, select min_block_size=135 which would generate 1 TRT engine(s)
       - The current level of graph segmentation is equivalent to selecting min_block_size=5 which generates 1 TRT engine(s)

**Sections explained:**

Coverage summary
    Total operators, TRT-supported operators, and coverage percentage. "Supported" here
    means the operator has a converter registered **and** its capability validator
    passes for this specific node.

Unsupported ops
    Operators that will fall back to PyTorch, with their occurrence count. Check these
    against the converter registry or your ``torch_executed_ops`` setting.

Nodes set to run in Torch
    Specific nodes excluded from TRT blocks. A node may appear here even if it has a
    converter, if it was not included in any TRT block by the partitioner (e.g., it was
    below ``min_block_size``).

Graph structure
    ASCII schematic of input tensors → TRT engine blocks → output tensors. Each TRT
    engine block shows its input/output shapes and operator count. Use this to see
    where PyTorch↔TRT transitions occur.

Aggregate stats
    Min, max, and average operator counts per engine. More engines with fewer operators
    each means more context-switch overhead.

Recommendations
    Suggested ``min_block_size`` values and the resulting engine counts:

    * **Minimal segmentation** — the largest block absorbs the most operators; generates
      the fewest engines.
    * **Current setting** — what your current ``min_block_size`` produces.

    For models where TRT coverage is close to 100%, a single large engine is usually
    optimal. For mixed models, the recommendation helps you balance coverage vs. overhead.

----

Workflow: Tuning min_block_size
---------------------------------

.. code-block:: python

    # Step 1: run dryrun with a loose min_block_size to see the full partition map
    torch_tensorrt.dynamo.compile(
        exp_program, arg_inputs=inputs,
        dryrun="/tmp/report_loose.txt",
        min_block_size=1,
    )

    # Step 2: read recommendations in the report, pick an appropriate value
    # Step 3: compile for real
    trt_gm = torch_tensorrt.dynamo.compile(
        exp_program, arg_inputs=inputs,
        min_block_size=10,
    )

----

Debugging Fallback Ops
-----------------------

If an op you expect to be TRT-supported appears in the unsupported list:

1. Check that a converter is registered:

   .. code-block:: python

       from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS
       print(torch.ops.aten.embedding.default in DYNAMO_CONVERTERS)

2. Check the converter's capability validator:

   .. code-block:: python

       from torch_tensorrt.dynamo.partitioning import get_graph_converter_support
       n_supported, n_total = get_graph_converter_support(gm, torch_executed_ops=set())
       print(f"{n_supported}/{n_total} ops supported")

3. Check ``torch_executed_ops`` — the op may be explicitly forced to PyTorch.

4. Check ``min_block_size`` — the block containing the op may have been merged back into
   PyTorch because it had too few operators. Reduce ``min_block_size`` in dryrun to
   confirm.

----

Saving the Report
-----------------

Pass a string path to ``dryrun`` to persist the report:

.. code-block:: python

    torch_tensorrt.dynamo.compile(
        exp_program, arg_inputs=inputs, dryrun="/tmp/report.txt"
    )

If the file already exists, a warning is logged and the file is **not** overwritten.
Remove the old file manually before rerunning.

The dryrun output is also available at ``DEBUG`` log level even when ``dryrun=False``:

.. code-block:: python

    import logging
    logging.getLogger("torch_tensorrt").setLevel(logging.DEBUG)