Dryrun Mode#
Dryrun mode runs the full Torch-TensorRT partitioning pipeline — lowering, capability checking, graph splitting — but stops before building any TRT engines. It prints a detailed report describing exactly how the graph will be partitioned and which operators will run in TRT vs. PyTorch.
Use dryrun to:
Understand TRT operator coverage for a new model without waiting for compilation.
Tune
min_block_sizebefore committing to a full compile.Debug why an op is falling back to PyTorch.
Compare the partitioning effect of different
CompilationSettings.
Enabling Dryrun#
Set dryrun=True to print the report to stdout:
import torch
import torch_tensorrt
exp_program = torch.export.export(model, tuple(inputs))
torch_tensorrt.dynamo.compile(
exp_program,
arg_inputs=inputs,
dryrun=True,
)
Set dryrun to a file path to also save the report:
torch_tensorrt.dynamo.compile(
exp_program,
arg_inputs=inputs,
dryrun="/tmp/partition_report.txt",
)
dryrun is a torch_tensorrt.dynamo.CompilationSettings parameter and can
also be passed via torch.compile:
trt_model = torch.compile(model, backend="tensorrt", options={"dryrun": True})
trt_model(*inputs) # report printed on first forward pass
Reading the Report#
A typical dryrun report looks like this:
++++++++++++++++++++++++++ Dry-Run Results for Graph ++++++++++++++++++++++++++
The graph consists of 142 Total Operators, of which 138 operators are supported, 97.18% coverage
The following ops are currently unsupported or excluded from conversion, and are listed with their op-count in the graph:
torch.ops.aten.embedding.default: 1
torch.ops.aten.index.Tensor: 3
Compiled with: CompilationSettings(enabled_precisions={dtype.f32}, min_block_size=5, ...)
Graph Structure:
Inputs: List[Tensor: (1, 512)@int64]
...
TRT Engine #1 - Submodule name: _run_on_acc_0
Engine Inputs: List[Tensor: (1, 512, 768)@float32]
Number of Operators in Engine: 135
Engine Outputs: Tensor: (1, 512, 30522)@float32
...
Outputs: List[Tensor: (1, 512, 30522)@float32]
------------------------- Aggregate Stats -------------------------
Average Number of Operators per TRT Engine: 135.0
Most Operators in a TRT Engine: 135
********** Recommendations **********
- For minimal graph segmentation, select min_block_size=135 which would generate 1 TRT engine(s)
- The current level of graph segmentation is equivalent to selecting min_block_size=5 which generates 1 TRT engine(s)
Sections explained:
- Coverage summary
Total operators, TRT-supported operators, and coverage percentage. “Supported” here means the operator has a converter registered and its capability validator passes for this specific node.
- Unsupported ops
Operators that will fall back to PyTorch, with their occurrence count. Check these against the converter registry or your
torch_executed_opssetting.- Nodes set to run in Torch
Specific nodes excluded from TRT blocks. A node may appear here even if it has a converter, if it was not included in any TRT block by the partitioner (e.g., it was below
min_block_size).- Graph structure
ASCII schematic of input tensors → TRT engine blocks → output tensors. Each TRT engine block shows its input/output shapes and operator count. Use this to see where PyTorch↔TRT transitions occur.
- Aggregate stats
Min, max, and average operator counts per engine. More engines with fewer operators each means more context-switch overhead.
- Recommendations
Suggested
min_block_sizevalues and the resulting engine counts:Minimal segmentation — the largest block absorbs the most operators; generates the fewest engines.
Current setting — what your current
min_block_sizeproduces.
For models where TRT coverage is close to 100%, a single large engine is usually optimal. For mixed models, the recommendation helps you balance coverage vs. overhead.
Workflow: Tuning min_block_size#
# Step 1: run dryrun with a loose min_block_size to see the full partition map
torch_tensorrt.dynamo.compile(
exp_program, arg_inputs=inputs,
dryrun="/tmp/report_loose.txt",
min_block_size=1,
)
# Step 2: read recommendations in the report, pick an appropriate value
# Step 3: compile for real
trt_gm = torch_tensorrt.dynamo.compile(
exp_program, arg_inputs=inputs,
min_block_size=10,
)
Debugging Fallback Ops#
If an op you expect to be TRT-supported appears in the unsupported list:
Check that a converter is registered:
from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS print(torch.ops.aten.embedding.default in DYNAMO_CONVERTERS)
Check the converter’s capability validator:
from torch_tensorrt.dynamo.partitioning import get_graph_converter_support n_supported, n_total = get_graph_converter_support(gm, torch_executed_ops=set()) print(f"{n_supported}/{n_total} ops supported")
Check
torch_executed_ops— the op may be explicitly forced to PyTorch.Check
min_block_size— the block containing the op may have been merged back into PyTorch because it had too few operators. Reducemin_block_sizein dryrun to confirm.
Saving the Report#
Pass a string path to dryrun to persist the report:
torch_tensorrt.dynamo.compile(
exp_program, arg_inputs=inputs, dryrun="/tmp/report.txt"
)
If the file already exists, a warning is logged and the file is not overwritten. Remove the old file manually before rerunning.
The dryrun output is also available at DEBUG log level even when dryrun=False:
import logging
logging.getLogger("torch_tensorrt").setLevel(logging.DEBUG)