tlparse / TORCH_TRACE#

Created On: Jul 29, 2025 | Last Updated On: Jan 04, 2026

tlparse / TORCH_TRACE is a pair of tools that produce compilation reports that look like this.

Traces are fairly straightforward to collect. To collect a trace, run your model like so:

TORCH_TRACE="/tmp/tracedir" python foo.py
pip install tlparse
tlparse /tmp/tracedir --latest

The --latest flag processes the latest log in the directory. You can also process a specific log file with tlparse <log_file>.

By default, the output is stored in a tl_out folder. You can also specify an output folder with -o my_folder.

This approach works even if you are running a distributed job, providing a trace for each rank. It will open your browser with HTML similar to what’s generated above. If you are making a bug report for a complicated problem that you don’t have a standalone reproduction for, you can still greatly assist PyTorch developers by

attaching the trace log generated in /tmp/tracedir, or
attaching a zip containing all of the output of tlparse (e.g. all files in tl_out). Please do not just attach the index.html file as it only contains a catalogue of the output files, not the real outputs.

Warning

The trace log contains all of your model code. Do not share the trace log if the model you are working on is sensitive. The trace log does NOT contain weights.

The output of tlparse is primarily aimed for PyTorch developers, and the log format is easy to upload and share on GitHub. However, as a non-PyTorch developer, you can still extract useful information from it. We recommend starting with the inline help text in the report, which explains its contents. Here are some insights you can gain from a tlparse:

What model code was compiled by looking at the stack trie? This is especially useful if you’re not familiar with the codebase being compiled!
How many graph breaks / distinct compilation regions are there? (Each distinct compile is its own color coded block like [0/0]). Frames that are potentially graph-broken are light green [2/4]. If there are a lot of frames, that is suspicious, and suggests that you had some catastrophic graph breaks, or maybe your code isn’t a good match for torch.compile.
How many times did I recompile a particular frame? Something that recompiled a lot will look like: [10/0] [10/1] [10/2] - if something is being recompiled a lot, that is very suspicious and worth looking into, even if it isn’t the root cause of your problem.
Was there a compilation error? Frames that errored will look like [0/1].
What intermediate compiler products did I generate for a given frame? For example, you can look at the high-level generated FX graph or the generated Triton code.
Is there relevant information for a particular frame? You can find these in compilation_metrics.

Below are some file names and their descriptions. Depending on your specific program, you may not see all these files.

File name	Description
dynamo_output_graph	Output graph from Dynamo front-end graph capture
before_pre_grad_graph	The FX graph before we run any pre-autograd graph passes
after_pre_grad_graph	The FX graph after we run all pre-autograd graph passes
aot_autograd_cache_miss / aot_autograd_cache_hit	The cache key for aot_autograd_cache, and whether we had a cache hit or miss
aot_inference_graph	When autograd is not required (e.g. none of the tensors require gradient), the FX graph after decomposition.
aot_joint_graph	The joint fwd-bwd graph after autograd and decomposition
aot_forward_graph	The forward graph after partitioning the aot_joint_graph
aot_backward_graph	The backward graph after partitioning the aot_joint_graph
before_joint_graph	The FX graph before we run any joint graph pass
after_joint_graph	The FX graph after we run all joint graph passes
before_post_grad_graph	The FX graph before we run any post autograd graph pass
inductor_post_grad_graph	The FX graph after we run all post autograd graph passes
fx_graph_runnable	Mostly the same graph as before_post_grad_graph, but it’s a runnable python script. It also contains the torch configs and some wrapper code so you can run the graph with dummy inputs.
inductor_output_code	The code generated by Inductor
fx_graph_cache_miss/ fx_graph_cache_hit	The cache key for fx graph cache, and whether we had a cache hit or miss
dynamo_cpp_guards_str	The guard information from dynamo

TORCH_LOGS#

You can use the TORCH_LOGS environment variable to selectively enable parts of the torch.compile stack to log. TORCH_LOGS is in fact the source of logs for tlparse. The format of the TORCH_LOGS environment variable looks like this:

TORCH_LOGS="<option1>,<option2>,..." python foo.py

You can also programmatically set logging options using torch._logging.set_logs:

import logging
torch._logging.set_logs(graph_breaks=True, dynamic=logging.DEBUG)

The most useful options are:

graph_breaks: logs locations of graph breaks in user code and the reason for the graph break
guards: logs guards that are generated
recompiles: logs which function recompiled and the guards that failed, leading to the recompilation
dynamic: logs related to dynamic shapes
output_code: logs the code generated by Inductor

Some more helpful TORCH_LOGS options include:

Option	Description
+all	Output debug logs from all `torch.compile` components
+dynamo	Output debug logs from TorchDynamo
+aot	Output debug logs from AOTAutograd
+inductor	Output debug logs from TorchInductor
dynamic	Output logs from dynamic shapes
graph_code	Output the Python code for the FX graph that Dynamo generated
graph_sizes	Output the tensor sizes of the FX graph that Dynamo generated
trace_bytecode	Output the bytecode instructions that Dynamo is tracing through and the symbolic interpreter stack Dynamo is keeping track of
trace_source	Output the line of code in the original source that Dynamo is currently tracing through
bytecode	Output Dynamo-generated bytecode
guards	Output generated guards
recompiles	Output recompilation reasons (only the first guard check that fails)
recompiles_verbose	Output all guard checks that fail when a recompilation occurs
aot_graphs	Output graph generated by AOTAutograd
aot_joint_graphs	Output the joint forward-backward graph generated by AOTAutograd
output_code	Output code generated by Inductor
kernel_code	Output code generated by Inductor on a per-kernel basis
schedule	Output Inductor scheduling logs
perf_hints	Output Inductor perf hint logs
fusion	Output Inductor fusion logs

For the full list of options, see torch._logging and torch._logging.set_logs.

tlparse vs. TORCH_LOGS#

Generally, we suggest first using tlparse when encountering issues. tlparse is ideal for debugging large models and gaining a high-level overview of how your model was compiled. On the other hand, TORCH_LOGS is preferred for small examples and fine-grained debugging detail, when we already have an idea of which torch.compile component is causing the problem.

tlparse / TORCH_TRACE#

TORCH_LOGS#

tlparse vs. TORCH_LOGS#

Docs

Tutorials

Resources