tlparse / TORCH_TRACE#
Created On: Jul 29, 2025 | Last Updated On: Jul 29, 2025
tlparse / TORCH_TRACE
are a pair of tools that produce compilation reports that look like this.
Traces are fairly straightforward to collect. To collect a trace, run your model like so:
TORCH_TRACE="/tmp/tracedir" python foo.py
pip install tlparse
tlparse /tmp/tracedir
This approach works even if you are running a distributed job, providing a trace for each rank.
It will open your browser with HTML similar to what’s generated above.
If you are making a bug report for a complicated problem that you don’t have a standalone reproduction for,
you can still greatly assist PyTorch developers by attaching the trace log generated in /tmp/tracedir
.
Warning
The trace log contains all of your model code. Do not share the trace log if the model you are working on is sensitive. The trace log does NOT contain weights.
The output of tlparse
is primarily aimed for PyTorch developers,
and the log format is easy to upload and share on GitHub.
However, as a non-PyTorch developer, you can still extract useful information from it.
We recommend starting with the inline help text in the report, which explains its contents.
Here are some insights you can gain from a tlparse
:
What model code was compiled by looking at the stack trie? This is especially useful if you’re not familiar with the codebase being compiled!
How many graph breaks / distinct compilation regions are there? (Each distinct compile is its own color coded block like [0/0]). Frames that are potentially graph-broken are light green [2/4]. If there are a lot of frames, that is suspicious, and suggests that you had some catastrophic graph breaks, or maybe your code isn’t a good match for
torch.compile
.How many times did I recompile a particular frame? Something that recompiled a lot will look like: [10/0] [10/1] [10/2] - if something is being recompiled a lot, that is very suspicious and worth looking into, even if it isn’t the root cause of your problem.
Was there a compilation error? Frames that errored will look like [0/1].
What intermediate compiler products did I generate for a given frame? For example, you can look at the high-level generated FX graph or the generated Triton code.
Is there relevant information for a particular frame? You can find these in
compilation_metrics
.
TORCH_LOGS#
You can use the TORCH_LOGS
environment variable to selectively enable parts of the torch.compile
stack to log.
TORCH_LOGS
is in fact the source of logs for tlparse
. The format of the TORCH_LOGS
environment variable looks like this:
TORCH_LOGS="<option1>,<option2>,..." python foo.py
You can also programmatically set logging options using torch._logging.set_logs
:
import logging
torch._logging.set_logs(graph_breaks=True, dynamic=logging.DEBUG)
The most useful options are:
graph_breaks
: logs locations of graph breaks in user code and the reason for the graph breakguards
: logs guards that are generatedrecompiles
: logs which function recompiled and the guards that failed, leading to the recompilationdynamic
: logs related to dynamic shapesoutput_code
: logs the code generated by Inductor
Some more helpful TORCH_LOGS
options include:
Option |
Description |
---|---|
+all |
Output debug logs from all |
+dynamo |
Output debug logs from TorchDynamo |
+aot |
Output debug logs from AOTAutograd |
+inductor |
Output debug logs from TorchInductor |
dynamic |
Output logs from dynamic shapes |
graph_code |
Output the Python code for the FX graph that Dynamo generated |
graph_sizes |
Output the tensor sizes of the FX graph that Dynamo generated |
trace_bytecode |
Output the bytecode instructions that Dynamo is tracing through and the symbolic interpreter stack Dynamo is keeping track of |
trace_source |
Output the line of code in the original source that Dynamo is currently tracing through |
bytecode |
Output Dynamo-generated bytecode |
guards |
Output generated guards |
recompiles |
Output recompilation reasons (only the first guard check that fails) |
recompiles_verbose |
Output all guard checks that fail when a recompilation occurs |
aot_graphs |
Output graph generated by AOTAutograd |
aot_joint_graphs |
Output the joint forward-backward graph generated by AOTAutograd |
output_code |
Output code generated by Inductor |
kernel_code |
Output code generated by Inductor on a per-kernel basis |
schedule |
Output Inductor scheduling logs |
perf_hints |
Output Inductor perf hint logs |
fusion |
Output Inductor fusion logs |
For the full list of options, see torch._logging and torch._logging.set_logs.
tlparse vs. TORCH_LOGS#
Generally, we suggest first using tlparse
when encountering issues.
tlparse
is ideal for debugging large models and gaining a high-level overview of how your model was compiled.
On the other hand, TORCH_LOGS
is preferred for small examples and fine-grained debugging detail,
when we already have an idea of which torch.compile
component is causing the problem.