Rate this Page

tlparse / TORCH_TRACE#

Created On: Jul 29, 2025 | Last Updated On: Jan 04, 2026

tlparse / TORCH_TRACE is a pair of tools that produce compilation reports that look like this.

Traces are fairly straightforward to collect. To collect a trace, run your model like so:

TORCH_TRACE="/tmp/tracedir" python foo.py
pip install tlparse
tlparse /tmp/tracedir --latest

The --latest flag processes the latest log in the directory. You can also process a specific log file with tlparse <log_file>.

By default, the output is stored in a tl_out folder. You can also specify an output folder with -o my_folder.

This approach works even if you are running a distributed job, providing a trace for each rank. It will open your browser with HTML similar to what’s generated above. If you are making a bug report for a complicated problem that you don’t have a standalone reproduction for, you can still greatly assist PyTorch developers by

  1. attaching the trace log generated in /tmp/tracedir, or

  2. attaching a zip containing all of the output of tlparse (e.g. all files in tl_out). Please do not just attach the index.html file as it only contains a catalogue of the output files, not the real outputs.

Warning

The trace log contains all of your model code. Do not share the trace log if the model you are working on is sensitive. The trace log does NOT contain weights.

The output of tlparse is primarily aimed for PyTorch developers, and the log format is easy to upload and share on GitHub. However, as a non-PyTorch developer, you can still extract useful information from it. We recommend starting with the inline help text in the report, which explains its contents. Here are some insights you can gain from a tlparse:

  • What model code was compiled by looking at the stack trie? This is especially useful if you’re not familiar with the codebase being compiled!

  • How many graph breaks / distinct compilation regions are there? (Each distinct compile is its own color coded block like [0/0]). Frames that are potentially graph-broken are light green [2/4]. If there are a lot of frames, that is suspicious, and suggests that you had some catastrophic graph breaks, or maybe your code isn’t a good match for torch.compile.

  • How many times did I recompile a particular frame? Something that recompiled a lot will look like: [10/0] [10/1] [10/2] - if something is being recompiled a lot, that is very suspicious and worth looking into, even if it isn’t the root cause of your problem.

  • Was there a compilation error? Frames that errored will look like [0/1].

  • What intermediate compiler products did I generate for a given frame? For example, you can look at the high-level generated FX graph or the generated Triton code.

  • Is there relevant information for a particular frame? You can find these in compilation_metrics.

Below are some file names and their descriptions. Depending on your specific program, you may not see all these files.

File name

Description

dynamo_output_graph

Output graph from Dynamo front-end graph capture

before_pre_grad_graph

The FX graph before we run any pre-autograd graph passes

after_pre_grad_graph

The FX graph after we run all pre-autograd graph passes

aot_autograd_cache_miss / aot_autograd_cache_hit

The cache key for aot_autograd_cache, and whether we had a cache hit or miss

aot_inference_graph

When autograd is not required (e.g. none of the tensors require gradient), the FX graph after decomposition.

aot_joint_graph

The joint fwd-bwd graph after autograd and decomposition

aot_forward_graph

The forward graph after partitioning the aot_joint_graph

aot_backward_graph

The backward graph after partitioning the aot_joint_graph

before_joint_graph

The FX graph before we run any joint graph pass

after_joint_graph

The FX graph after we run all joint graph passes

before_post_grad_graph

The FX graph before we run any post autograd graph pass

inductor_post_grad_graph

The FX graph after we run all post autograd graph passes

fx_graph_runnable

Mostly the same graph as before_post_grad_graph, but it’s a runnable python script. It also contains the torch configs and some wrapper code so you can run the graph with dummy inputs.

inductor_output_code

The code generated by Inductor

fx_graph_cache_miss/ fx_graph_cache_hit

The cache key for fx graph cache, and whether we had a cache hit or miss

dynamo_cpp_guards_str

The guard information from dynamo

TORCH_LOGS#

You can use the TORCH_LOGS environment variable to selectively enable parts of the torch.compile stack to log. TORCH_LOGS is in fact the source of logs for tlparse. The format of the TORCH_LOGS environment variable looks like this:

TORCH_LOGS="<option1>,<option2>,..." python foo.py

You can also programmatically set logging options using torch._logging.set_logs:

import logging
torch._logging.set_logs(graph_breaks=True, dynamic=logging.DEBUG)

The most useful options are:

  • graph_breaks: logs locations of graph breaks in user code and the reason for the graph break

  • guards: logs guards that are generated

  • recompiles: logs which function recompiled and the guards that failed, leading to the recompilation

  • dynamic: logs related to dynamic shapes

  • output_code: logs the code generated by Inductor

Some more helpful TORCH_LOGS options include:

Option

Description

+all

Output debug logs from all torch.compile components

+dynamo

Output debug logs from TorchDynamo

+aot

Output debug logs from AOTAutograd

+inductor

Output debug logs from TorchInductor

dynamic

Output logs from dynamic shapes

graph_code

Output the Python code for the FX graph that Dynamo generated

graph_sizes

Output the tensor sizes of the FX graph that Dynamo generated

trace_bytecode

Output the bytecode instructions that Dynamo is tracing through and the symbolic interpreter stack Dynamo is keeping track of

trace_source

Output the line of code in the original source that Dynamo is currently tracing through

bytecode

Output Dynamo-generated bytecode

guards

Output generated guards

recompiles

Output recompilation reasons (only the first guard check that fails)

recompiles_verbose

Output all guard checks that fail when a recompilation occurs

aot_graphs

Output graph generated by AOTAutograd

aot_joint_graphs

Output the joint forward-backward graph generated by AOTAutograd

output_code

Output code generated by Inductor

kernel_code

Output code generated by Inductor on a per-kernel basis

schedule

Output Inductor scheduling logs

perf_hints

Output Inductor perf hint logs

fusion

Output Inductor fusion logs

For the full list of options, see torch._logging and torch._logging.set_logs.

tlparse vs. TORCH_LOGS#

Generally, we suggest first using tlparse when encountering issues. tlparse is ideal for debugging large models and gaining a high-level overview of how your model was compiled. On the other hand, TORCH_LOGS is preferred for small examples and fine-grained debugging detail, when we already have an idea of which torch.compile component is causing the problem.