Gradient Computation#
PyTorch provides functions for computing gradients of tensors with respect to graph leaves.
Gradient Functions#
-
void torch::autograd::backward(const variable_list &tensors, const variable_list &grad_tensors = {}, std::optional<bool> retain_graph = std::nullopt, bool create_graph = false, const variable_list &inputs = {})#
Computes the sum of gradients of given tensors with respect to graph leaves.
The graph is differentiated using the chain rule. If any of
tensorsare non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifyinggrad_tensors. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding tensors (torch::Tensor()is an acceptable value for all tensors that don’t need gradient tensors).This function accumulates gradients in the leaves — you might need to zero them before calling it.
:param tensors: Tensors of which the derivative will be computed. :param grad_tensors: The “vector” in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors.
torch::Tensor()values can be specified for scalar Tensors or ones that don’t require grad. If atorch::Tensor()value would be acceptable for all grad_tensors, then this argument is optional. :param retain_graph: Iffalse, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option totrueis not needed and often can be worked around in a much more efficient way. Defaults to the value ofcreate_graph. :param create_graph: Iftrue, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults tofalse. :param inputs: Inputs w.r.t. which the gradient will be accumulated intoat::Tensor::grad. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to computetensors.
-
variable_list torch::autograd::grad(const variable_list &outputs, const variable_list &inputs, const variable_list &grad_outputs = {}, std::optional<bool> retain_graph = std::nullopt, bool create_graph = false, bool allow_unused = false)#
Computes and returns the sum of gradients of outputs with respect to the inputs.
grad_outputsshould be a sequence of length matchingoutputcontaining the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’t require_grad, then the gradient can betorch::Tensor()).- Parameters:
outputs – outputs of the differentiated function.
inputs – Inputs w.r.t. which the gradient will be returned (and not accumulated into
at::Tensor::grad).grad_outputs – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output.
torch::Tensor()values can be specified for scalar Tensors or ones that don’t require grad. If atorch::Tensor()value would be acceptable for all grad_tensors, then this argument is optional. Default:{}.retain_graph – If
false, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option totrueis not needed and often can be worked around in a much more efficient way. Defaults to the value ofcreate_graph.create_graph – If
true, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default:false.allow_unused – If
false, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults tofalse.
Example:
#include <torch/torch.h>
auto x = torch::randn({2, 2}, torch::requires_grad());
auto y = x * x;
auto z = y.sum();
// Compute gradients
z.backward();
std::cout << x.grad() << std::endl;
// Or use grad() for specific outputs
auto grads = torch::autograd::grad({z}, {x});
Tensor Gradient Methods#
Tensors have built-in methods for gradient computation:
// Enable gradient tracking
auto x = torch::randn({2, 2}).requires_grad_(true);
// Check if gradient is required
bool needs_grad = x.requires_grad();
// Access the gradient after backward
auto grad = x.grad();
// Detach from computation graph
auto x_detached = x.detach();