Troubleshooting GuardOnDataDependentSymNode Errors#
Created On: Sep 22, 2025 | Last Updated On: Sep 22, 2025
When working with PyTorch models that have data-dependent control flow (using functions
like item()
, tolist()
, or nonzero())
, you may encounter GuardOnDataDependentSymNode
errors.
This section explains what these errors are and how to fix them.
Common Error Pattern#
The following output shows the common error pattern GuardOnDataDependentSymNode
errors:
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u2, -1) (unhinted: Eq(u2, -1)). (Size-like symbols: none)
Potential framework code culprit (scroll up for full backtrace):
File "/data/users/ezyang/a/pytorch/torch/_prims_common/__init__.py", line 855, in infer_size
if d == -1:
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u2"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing
Root Cause#
These errors occur when PyTorch tries to convert a symbolic quantity (for example, u2 == -1
)
into a concrete value (such as, False
) to make branching decisions. In a typical scenario,
where data-dependent sizes are not involved, PyTorch can determine the concrete value at
compile time and install a guard to ensure the compilation result remains valid. However,
with data-dependent quantities, the true value is unknown at compile time, resulting in errors.
You can often rewrite your model, by adding torch._check
or torch._check_is_size
to
bypass these issues. This document aims to teach you how.
Debugging Tools#
Here is the list of some of the debugging tools available in PyTorch that you can use to troubleshoot these errors:
TORCH_LOGS="dynamic"
- Shows detailed logs about symbolic operationsTORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u2"
- Provides extended logs for specific symbolsTORCHDYNAMO_EXTENDED_DEBUG_CPP=1
- Helps when guards are triggered from C++
Error Variations#
Here is a the list of error variations that you might encounter:
Error Variations |
Description |
---|---|
“Could not guard on data-dependent expression” |
Occurs when trying to extract a concrete boolean from expressions like u0 == 0 or u0 > 10 |
“Could not extract specialized integer from data-dependent expression” |
Occurs when trying to extract a concrete integer value. |
How to Diagnose Your Problem#
Step 1: Examine the Potential Framework Culprit (Python Backtrace)#
The exception provides a backtrace, which often indicates the problem. Given that PT2 backtraces can be lengthy, the error message will also suggest a potential framework culprit. For example:
Potential framework code culprit (scroll up for full backtrace):
File "/data/users/ezyang/a/pytorch/torch/_prims_common/__init__.py", line 855, in infer_size
if d == -1:
Consider the Following:
Does it make sense that this condition is triggering a guard on a data-dependent symbol?
Should we know if the quantity in question is size-like? (The exception lists size-like symbols; if a symbol is not listed, it might be an arbitrary integer.)
If the equation involves two distinct symbols, should we know they are actually equal?
If all symbols are size-like but the equation involves 0 or 1, are we missing a
guard_size_oblivious
wrapper? (Remember, forguard_size_oblivious
between two size tuples, usesym_eq
instead of regular equality.)
In the example above, testing if d
(a data-dependent value) is -1
suggests
that d
should be non-negative if it were a size. This indicates a missing
torch._check_is_size
. If d
is already size-like but numel() == 0
fails,
consider wrapping it in guard_size_oblivious
.
Using TORCH_LOGS=dynamic
and examining the user stack trace is crucial for
understanding how to fix the problem, as they guide you on how to modify the
user program.
[INFO] create_unbacked_symint u0 [-9223372036854775808, 9223372036854775807] (w.py:40 in custom_op_meta)
This log message indicates where (w.py:40
) the unbacked SymInt
was
allocated. An unbacked SymInt
may be allocated multiple times, so track
their equalities:
[INFO] set_replacement u1 = u0 (trivial_lhs) ValueRanges(lower=0, upper=9223372036854775807, is_bool=False)
Step 2: Examine the C++ Backtrace#
If the framework code culprit is uninformative, the guard might be in C++. You can
force a C++ backtrace by running with TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
. This
provides a detailed C++ backtrace with Python, CPython, and C10/ATen/libtorch
frames interspersed. Look for symbols in the at::
or c10::
namespace that
resemble kernel-specific code, likely related to the kernel executed per the Python
backtrace. If using a non-debug build of PyTorch, inlining may cause missing
frames, requiring source code investigation to locate the issue. For example, see https://github.com/pytorch/pytorch/pull/118579.
Here is an example C++ backtrace from a debugging session:
[2024-02-08 08:20:45,259] torch.fx.experimental.symbolic_shapes: [INFO] File "../
__gen_aten__/out/RegisterCompositeImplicitAutograd.cpp", line 2025, in at::
(anonymous namespace)::(anonymous namespace)
::wrapper_CompositeImplicitAutograd_Tensor_narrow(at::Tensor const&, long,
at::Tensor const&, c10::SymInt) [2024-02-08 08:20:45,259] torch.fx.experimental.
symbolic_shapes: [INFO] File "../aten/src/ATen/native/TensorShape.cpp", line 1410,
in at::native::narrow_tensor_symint(at::Tensor const&, long, at::Tensor const&,
c10::SymInt) [2024-02-08 08:20:45,259] torch.fx.experimental.symbolic_shapes:
[INFO] File "../__gen_aten__/out/core/TensorMethods.cpp", line 52, in long
at::Tensor::item<long>() const [2024-02-08 08:20:45,259] torch.fx.experimental.
symbolic_shapes: [INFO] File "../ATen/core/TensorBody.h", line 4274, in
at::Tensor::item() const
In this example, at::native::narrow_tensor_symint
calls into item
, which
triggers the guard on a data-dependent SymNode
. You can modify the C++ code to
avoid specializing, or verify if you should be in this C++ code (e.g., start
was
not expected to be a Tensor
, and modifying this fixed the problem).
Tools for Fixing Errors#
There are a few important functions which you should use to troubleshoot this problem.
torch._check(cond, msg_fn)#
torch._check
is a function used to assert conditions at runtime, particularly when dealing with symbolic integers (SymInts
) in PyTorch.
Example Usage:
torch._check(x.size(0) == y, lambda: f"size mismatch: {x.size(0)} != {y}")
The code above does the following:
Creates a deferred runtime assertion instead of a compile-time guard
Teaches the symbolic reasoning system facts about your unbacked SymInts
Can eliminate unbacked symbols by replacing them with equivalent expressions
Refines value ranges of symbols
Remembers boolean expressions that are always true
Semantically, the function behaves like a conditional check:
if not cond:
raise RuntimeError(msg_fn())
But there a number of key differences:
The condition is always assumed true at compile time, even if it involves unbacked
SymInts
. The actual check is deferred to runtime, avoiding compile-time errors. Instead of setting up a guard, we implement a deferred runtime assertion to verify the condition at runtime. At compile time, we assume the condition won’t trigger an error, so we don’t need to determine if it evaluates toTrue
orFalse
.If you perform an equality test
u0 = RHS
, we try to replace all instances ofu0
with RHS. We will ALWAYS do this if RHS has no unbacked symbols, as removing unbacked symbols is beneficial—eliminating them prevents the creation of aGuardOnDataDependentSymNode
. Even if we are not able to eliminate u0, we can refine its value range. The value range specifies what the set of possible values for a variable are. By default, size-like unbacked SymInts have a value range of[0, Inf]
; if you assert it is equal to an expression with a refined value range, say[2, 20]
, thenu0
’s value range will be updated to[2, 20]
. We also have limited support for propagating value ranges in reverse.If you perform a boolean test
f(u0)
, we will remember that this expression always evaluates to True, and if you evaluate an expression that contains this expression, we will substitute it with True. We also support some limited reasoning on logically equivalent statements. For example, if youtorch._check(u0 < 4)
, we will also know thatu0 >= 4
evaluates toFalse
, and so performing a test like this in a normal non-check conditional will go through fine.
torch._check_is_size(size)
and guard_size_oblivious(cond)
#
Example:
u0 = y.item()
torch._check_is_size(u0)
Semantic Equivalent:
if u0 < 0:
raise RuntimeError("u0 is not a size")
Key Differences:
Like torch._check
, this test will always succeed at compile time, and it will establish that u0 >= 0
. This refines the value range of u0
to [0, Inf]
instead of [-Inf, Inf]
.
Marking u0
as size-like is crucial. Size-like unbacked SymInts
behave like
their regular counterparts, except when involved in a boolean expression
evaluated with guard_size_oblivious
. In such cases, they are assumed not to equal zero or one, temporarily setting their value range to [2, Inf]
. For instance, a conditional check like u0 == 1
will evaluate to False
when u0
is size-like, instead of causing an error.
For example, guard_size_oblivious(u0 == 1)
will always return False
when u0
is size-like.
Marking unbacked symbols as size-like is essential in contexts where tensor
sizes are expected. PyTorch internals often check if sizes are zero or one to
handle special cases related to empty or single-element tensors. If you pass an
unbacked symbol to a factory function like torch.empty
, it will automatically
be marked as size-like. However, some quantities, like arguments to Tensor.view
,
cannot be inferred as size-like because -1
is a valid argument. In such cases,
you need to explicitly use torch._check_is_size
on an unbacked SymInt
before
passing it to view
.
In PyTorch framework code, if you need to test a size for zero or one, wrap the
test in guard_size_oblivious
to assume that size-like unbacked SymInts
will
not pass this test. Generally, most framework code has logic for the >= 2
case, which works for the 0/1
case. If using guard_size_oblivious
in
PyTorch framework code resolves your issue, it’s likely acceptable. However,
avoid using guard_size_oblivious
in user code, especially if different
behavior is required for the 0/1
case at runtime, such as in a
hand-tracking application.
In C++, this can be done with TORCH_GUARD_SIZE_OBLIVIOUS(u0.sym_eq(0))
, for example.
torch._check_is_size(size, max=upper_bound) (New)#
This function is semantically equivalent to torch._check(size <= upper_bound)
.
However, under guard_size_oblivious
, it assumes that size < upper_bound
.
This functionality only works when the upper bound is an integer constant. If
upper_bound
is a symbolic expression, normal semantics apply. There is
potential to extend this functionality to symbolic expressions with further
development.
For more details, see the related issue https://github.com/pytorch/pytorch/issues/120288.
torch._constrain_as_value
and torch._constrain_as_size
#
These APIs are more specialized and are effectively equivalent to
torch._check
and torch._check_is_size
, with the added capability
of adjusting the value range of a variable by specifying minimum and
maximum values. However, in recommendation models, these functions are
unlikely to resolve GuardOnDataDependentSymNode
errors effectively.
While constrain_as_value
might seem like a convenient way to ensure a
variable stays within the bounds of another tensor, it is often impractical.
This is because value ranges only support constant bounds, and it’s common
for the tensor you want to index into to have a symbolic dimension (for
example, s0
). Using its size as the maximum value for a value range
will force specialization, which is usually undesirable. Instead, if
necessary, manually handle range checks by using torch._check()
on
appropriate expressions based on the errors you encounter.
Common Fix Patterns#
There are several common methods to resolve issues like this. Below, we outline the most frequently used solutions.
When It’s Unfixable#
In some cases, the issue is genuinely unfixable due to the nature of the code. Consider the following example:
i = x.item()
if i > 4:
return x * 2
else:
return x + 3
If the user code is branching on a data-dependent value, it is impossible to
trace as is. In such cases, you may need to consider alternative approaches,
such as using torch.cond
.
Another common pattern involves indexing with a data-dependent value:
return self.mlps[x.item()]
Here, self.mlps
is a Python list or ModuleList
, and the code branches on a data-dependent value. The simplest solution is to induce a graph break before the indexing operation.
u0
is a Size, but We Don’t Know It#
Some guards fail on tests that essentially ask, “Is this a size?” but we don’t know it is a size. These fall into two categories:
Regular Tests:
These are tests like
u0 >= 0
oru0 != -1
that are unconditionally true for sizes. Adding atorch._check_is_size(...)
on the relevant size will assert that these tests are true. This is typically uncommon because if the test is for error checking, we can infer that the condition must be true, as an error would occur otherwise. An important exception is APIs that accept both sizes and-1
; in such cases, the user must indicate that the input data-dependent quantity cannot be-1
, as something unusual would happen otherwise. For an example, see https://github.com/pytorch/pytorch/pull/107788.Sometimes, you can refactor an error-checking API to split a logical disjunction of conditionals into separate conditionals. If you can do so to achieve a single
torch._check(x == y)
statement, it will enable the automatic generation of a deferred runtime assertion. For an example, see https://github.com/pytorch/pytorch/pull/110979.Edge Case Tests:
These are tests like
u0 == 0
oru0 == 1
, which are not always true for sizes, but where our choice doesn’t really matter. These tests handle edge cases, such as dealing with an empty tensor or testing for broadcasting when we want to assume broadcasting is not occurring. To resolve these situations, two steps are needed:First, the guard itself must be evaluated via
guard_size_oblivious
, which assumes that size-like integers cannot equal zero or one, with the promise that if they do, something reasonable will happen.Second, the symbols themselves must be marked as size-like, either inferred because they were passed to tensor factory functions or explicitly specified with
torch._check_is_size(...)
. For examples of making guards size-oblivious, see https://github.com/pytorch/pytorch/pull/118579.
Sometimes, these tests can occur in C++. While there are corresponding C++ APIs for these tests, it can be more challenging to localize the problem, as you do not get a useful backtrace by default.
u0
is Actually Equal to u1
, but We Don’t Know It#
Multiple unbacked SymInts
can be known to be equal at compile time:
i0 = x.sum().item()
i1 = x.sum().item()
return torch.randn(i0) + torch.randn(i1)
If there is a torch._check(i0 == i1)
somewhere (in the example above, this
check would occur inside the shape-checking rule for addition), we will
automatically unify the two unbacked SymInts
and recognize them as equal.
However, if such an assertion is missing, you may need to explicitly add an
assertion to achieve this unification. For an example, see
https://github.com/pytorch/pytorch/issues/111950).
Note
If we allocate an unbacked SymInt
and
immediately set it equal to another, these instances are benign and not easily
eliminated entirely from the framework.
u0
is a Tensor#
Another reason you might be overallocating unbacked SymInts
is due to passing
around a Tensor
and relying on its implicit conversion to an integer. Many
functions that accept an integer will also accept a Tensor
and automatically
call item()
on the integer argument. It’s beneficial to examine
TORCH_LOGS=dynamic
to determine whether the number of unbacked SymInts
is
as expected or excessive. When this occurs, a new SymInt
will be allocated at
the line where a PyTorch function is invoked.
This issue is less likely to cause problems now because the return value of
t.item()
is memoized, ensuring that you consistently receive the same unbacked
SymInt
if you call it multiple times.
Overspecialization Issue#
In non-strict export mode, consider the following code:
u0 = x.sum().item() return y[:u0]
This code will fail when trying to evaluate u0
because, when a SymInt
is
used directly inside a Python slice (without using Dynamo), Python forces the
integer to be specialized and fails if it is unbacked.
To resolve this, you can rewrite the program to avoid specialization. For the example above, you can fix it by not using slices:
u0 = x.sum().item() return y.narrow(0, 0, u0)
For more details, see the related issue https://github.com/pytorch/pytorch/issues/111950.
Use Lengths Instead of Offsets#
When working with variable sequence lengths, it’s common to have tensors
representing either the lengths or offsets of the sequences. For example, given
values = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
, you might have lengths = [3, 2, 4]
and offsets = [0, 3, 5, 9]
. While these representations are interconvertible,
it’s better to work with lengths when dealing with them as integers (by calling
lengths.tolist()
), rather than offsets.
The reason is that when you perform a torch.split()
on your values
tensor, you
need to create tensors for each sub-sequence, such as tensors of sizes 3, 2, and 4.
If you have unbacked SymInts
for sizes, they become u0
, u1
, and u2
. You can
easily indicate that they are size-like, and you’re done. However, if you have
unbacked SymInts
for offsets, they become u1 - u0
, u2 - u1
, u3 - u2
, which
complicates matters. These quantities cannot be conveniently marked as size-like,
leading to potential issues. Since it’s relatively straightforward to write code
using either lengths or offsets, you should prefer using lengths.