Rate this Page

Aliases in torch.optim#

Created On: Jul 18, 2025 | Last Updated On: Jul 18, 2025

The following are aliases to their counterparts in torch.optim in the nested namespaces in which they are defined. For any of these APIs, feel free to use the top-level version in torch.optim like torch.optim.Adam or the nested version torch.optim.adam.Adam.

Adadelta

Implements Adadelta algorithm.

adadelta

Functional API that performs Adadelta algorithm computation.

Adagrad

Implements Adagrad algorithm.

adagrad

Functional API that performs Adagrad algorithm computation.

Adam

Implements Adam algorithm.

adam

Functional API that performs Adam algorithm computation.

Adamax

Implements Adamax algorithm (a variant of Adam based on infinity norm).

adamax

Functional API that performs adamax algorithm computation.

AdamW

Implements AdamW algorithm, where weight decay does not accumulate in the momentum nor variance.

adamw

Functional API that performs AdamW algorithm computation.

ASGD

Implements Averaged Stochastic Gradient Descent.

asgd

Functional API that performs asgd algorithm computation.

LBFGS

Implements L-BFGS algorithm.

Implementation for the NAdam algorithm.

NAdam

Implements NAdam algorithm.

nadam

Functional API that performs NAdam algorithm computation.

Implementation for the RAdam algorithm.

RAdam

Implements RAdam algorithm.

radam

Functional API that performs RAdam algorithm computation.

Implementation for the RMSprop algorithm.

RMSprop

Implements RMSprop algorithm.

rmsprop

Functional API that performs rmsprop algorithm computation.

Implementation for the Resilient backpropagation.

Rprop

Implements the resilient backpropagation algorithm.

rprop

Functional API that performs rprop algorithm computation.

Implementation for Stochastic Gradient Descent optimizer.

SGD

Implements stochastic gradient descent (optionally with momentum).

sgd

Functional API that performs SGD algorithm computation.

SparseAdam

SparseAdam implements a masked version of the Adam algorithm suitable for sparse gradients.