Dynamo

Recent

Introducing debug-graph-breaks: A Skill for Torch Compile Debugging

Arsh Zahed (@azahed98) · May 15, 2026
dynamotorch.compilegraph-breaksskills

I’m excited to share debug-graph-breaks, a new skill for debugging Torch Compile graph breaks, now available in the meta-pytorch/skills repository. Torch Compile graph breaks prevent full graph capture and hurt performance. This skill helps you: Identify root causes of graph breaks Understand why operations break compilation Get actionable fixes with specific code changes Learn best practices for Torch Compile-friendly code The skill is grounded in the Graph Break Website as its knowledge base—improvements to the website directly improve the skill’s quality. Evaluated on the OSS Model Graph Break Corpus—a collection of real-world graph break scenarios from open-source models. Evaluation …

Continue reading →

Toward Agent-Friendly Dynamo: Mirroring CPython Semantics

Animesh Jain (@anijain2305) · May 13, 2026
dynamocpythonllm-agentsgraph-breakstp-slots

TL;DR – Dynamo’s ad-hoc CPython support creates fragmented graph breaks that are hard to fix — even for LLM agents. By refactoring Dynamo to mirror CPython’s tp_* slot semantics, we make the system systematically auditable and agent-friendly, already lifting CPython test pass rates from 38% to 45% and proactively eliminating classes of graph breaks in frontier models. Working with frontier training frameworks has surfaced some fundamental issues in Dynamo. The issues broadly fell into four categories: CPython language gaps: For example, Dynamo supports calling a functools.partial object but did not support hashing it. Insufficient exception messages: One frontier framework had an unusual …

Continue reading →

Nested Graph Breaks: May 2026 Update

William Wen (@williamwen42) · May 13, 2026
dynamotorch.compilegraph-breaks

torch._dynamo.config.nested_graph_breaks = True has been enabled on all Dynamo and Inductor unit tests (~250 test files). A sweep of the OSS benchmark models with graph breaks shows 81/82 passing with NGB (the single regression is a pre-existing unstable model), with graph break reductions of up to 67% and graph merging in models with complex nested call structures (GNNs, detection models). Dynamo tracing time is neutral or improved for most models, and models with significant graph merging see up to 15% runtime speedup (8% geomean). The remaining goal is to set nested_graph_breaks to True by default. The nested graph break problem in torch.compile refers to the Dynamo limitation of only …

Continue reading →

Printing and Inspecting Tensors Inside torch.compile

Xiao Fu (@fxdawnn), Shangdi Yu(@yushangdi) · May 6, 2026
torch.compiledebuggingprintingloggingPT2

TL;DR – A complete toolkit for inspecting tensors inside torch.compile — print forward activations, inspect backward gradients, all without graph breaks. Debugging numerical issues inside torch.compile has historically been painful. Any attempt to insert print() or logging calls would trigger graph breaks, defeating the purpose of compilation. Users needed a way to inspect tensor values (shapes, norms, gradients) in both the forward and backward pass without sacrificing compiler guarantees. In a previous post, we introduced torch._higher_order_ops.print as a graph-break-free printing primitive. Since then, we’ve expanded it into a full toolkit covering forward activations, backward …

Continue reading →

Dynamo Isolate Recompiles for torch.compile

Xiao Fu(@fxdawnn), William Wen(williamwen42), Animesh Jain(anijain2305), Laith Sakka(laithsakka) · May 4, 2026
dynamotorch.compilecachingrecompilation

TL;DR – We introduce isolate_recompiles=True for torch.compile, which gives each invocation its own isolated cache bucket — solving recompile limit collisions in factory patterns and dynamic shapes dispatch by refactoring Dynamo’s cache from per code-object to per torch.compile() invocation. Multiple torch.compile(fn, …) wrappers can share the same underlying code object (fn). In Python, a code object is created once per def statement as opposed to once per function invocation. This means that in factory patterns, or when compiling the same function with different compile options, every torch.compile(fn, …) invocation targeting functions from the same fn produces cache entries that land in …

Continue reading →

All Dynamo Logs