Aliases in torch.nn#
Created On: Jul 25, 2025 | Last Updated On: Jul 25, 2025
The following are aliases to their counterparts in torch.nn
in nested namespaces.
torch.nn.modules#
The following are aliases to their counterparts in torch.nn
in the torch.nn.modules
namespace.
Containers (Aliases)#
A sequential container. |
|
Holds submodules in a list. |
|
Holds submodules in a dictionary. |
|
Holds parameters in a list. |
|
Holds parameters in a dictionary. |
Convolution Layers (Aliases)#
Applies a 1D convolution over an input signal composed of several input planes. |
|
Applies a 2D convolution over an input signal composed of several input planes. |
|
Applies a 3D convolution over an input signal composed of several input planes. |
|
Applies a 1D transposed convolution operator over an input image composed of several input planes. |
|
Applies a 2D transposed convolution operator over an input image composed of several input planes. |
|
Applies a 3D transposed convolution operator over an input image composed of several input planes. |
|
A |
|
A |
|
A |
|
A |
|
A |
|
A |
|
Extracts sliding local blocks from a batched input tensor. |
|
Combines an array of sliding local blocks into a large containing tensor. |
Pooling layers (Aliases)#
Applies a 1D max pooling over an input signal composed of several input planes. |
|
Applies a 2D max pooling over an input signal composed of several input planes. |
|
Applies a 3D max pooling over an input signal composed of several input planes. |
|
Computes a partial inverse of |
|
Computes a partial inverse of |
|
Computes a partial inverse of |
|
Applies a 1D average pooling over an input signal composed of several input planes. |
|
Applies a 2D average pooling over an input signal composed of several input planes. |
|
Applies a 3D average pooling over an input signal composed of several input planes. |
|
Applies a 2D fractional max pooling over an input signal composed of several input planes. |
|
Applies a 3D fractional max pooling over an input signal composed of several input planes. |
|
Applies a 1D power-average pooling over an input signal composed of several input planes. |
|
Applies a 2D power-average pooling over an input signal composed of several input planes. |
|
Applies a 3D power-average pooling over an input signal composed of several input planes. |
|
Applies a 1D adaptive max pooling over an input signal composed of several input planes. |
|
Applies a 2D adaptive max pooling over an input signal composed of several input planes. |
|
Applies a 3D adaptive max pooling over an input signal composed of several input planes. |
|
Applies a 1D adaptive average pooling over an input signal composed of several input planes. |
|
Applies a 2D adaptive average pooling over an input signal composed of several input planes. |
|
Applies a 3D adaptive average pooling over an input signal composed of several input planes. |
Padding Layers (Aliases)#
Pads the input tensor using the reflection of the input boundary. |
|
Pads the input tensor using the reflection of the input boundary. |
|
Pads the input tensor using the reflection of the input boundary. |
|
Pads the input tensor using replication of the input boundary. |
|
Pads the input tensor using replication of the input boundary. |
|
Pads the input tensor using replication of the input boundary. |
|
Pads the input tensor boundaries with zero. |
|
Pads the input tensor boundaries with zero. |
|
Pads the input tensor boundaries with zero. |
|
Pads the input tensor boundaries with a constant value. |
|
Pads the input tensor boundaries with a constant value. |
|
Pads the input tensor boundaries with a constant value. |
|
Pads the input tensor using circular padding of the input boundary. |
|
Pads the input tensor using circular padding of the input boundary. |
|
Pads the input tensor using circular padding of the input boundary. |
Non-linear Activations (weighted sum, nonlinearity) (Aliases)#
Applies the Exponential Linear Unit (ELU) function, element-wise. |
|
Applies the Hard Shrinkage (Hardshrink) function element-wise. |
|
Applies the Hardsigmoid function element-wise. |
|
Applies the HardTanh function element-wise. |
|
Applies the Hardswish function, element-wise. |
|
Applies the LeakyReLU function element-wise. |
|
Applies the Logsigmoid function element-wise. |
|
Allows the model to jointly attend to information from different representation subspaces. |
|
Applies the element-wise PReLU function. |
|
Applies the rectified linear unit function element-wise. |
|
Applies the ReLU6 function element-wise. |
|
Applies the randomized leaky rectified linear unit function, element-wise. |
|
Applies the SELU function element-wise. |
|
Applies the CELU function element-wise. |
|
Applies the Gaussian Error Linear Units function. |
|
Applies the Sigmoid function element-wise. |
|
Applies the Sigmoid Linear Unit (SiLU) function, element-wise. |
|
Applies the Mish function, element-wise. |
|
Applies the Softplus function element-wise. |
|
Applies the soft shrinkage function element-wise. |
|
Applies the element-wise Softsign function. |
|
Applies the Hyperbolic Tangent (Tanh) function element-wise. |
|
Applies the element-wise Tanhshrink function. |
|
Thresholds each element of the input Tensor. |
|
Applies the gated linear unit function. |
Non-linear Activations (other) (Aliases)#
Applies the Softmin function to an n-dimensional input Tensor. |
|
Applies the Softmax function to an n-dimensional input Tensor. |
|
Applies SoftMax over features to each spatial location. |
|
Applies the function to an n-dimensional input Tensor. |
|
Efficient softmax approximation. |
Normalization Layers (Aliases)#
Applies Batch Normalization over a 2D or 3D input. |
|
Applies Batch Normalization over a 4D input. |
|
Applies Batch Normalization over a 5D input. |
|
A |
|
A |
|
A |
|
Applies Group Normalization over a mini-batch of inputs. |
|
Applies Batch Normalization over a N-Dimensional input. |
|
Applies Instance Normalization. |
|
Applies Instance Normalization. |
|
Applies Instance Normalization. |
|
A |
|
A |
|
A |
|
Applies Layer Normalization over a mini-batch of inputs. |
|
Applies local response normalization over an input signal. |
|
Applies Root Mean Square Layer Normalization over a mini-batch of inputs. |
Recurrent Layers (Aliases)#
Base class for RNN modules (RNN, LSTM, GRU). |
|
Apply a multi-layer Elman RNN with or non-linearity to an input sequence. |
|
Apply a multi-layer long short-term memory (LSTM) RNN to an input sequence. |
|
Apply a multi-layer gated recurrent unit (GRU) RNN to an input sequence. |
|
An Elman RNN cell with tanh or ReLU non-linearity. |
|
A long short-term memory (LSTM) cell. |
|
A gated recurrent unit (GRU) cell. |
Transformer Layers (Aliases)#
A basic transformer layer. |
|
TransformerEncoder is a stack of N encoder layers. |
|
TransformerDecoder is a stack of N decoder layers. |
|
TransformerEncoderLayer is made up of self-attn and feedforward network. |
|
TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network. |
Linear Layers (Aliases)#
A placeholder identity operator that is argument-insensitive. |
|
Applies an affine linear transformation to the incoming data: . |
|
Applies a bilinear transformation to the incoming data: . |
|
A |
Dropout Layers (Aliases)#
During training, randomly zeroes some of the elements of the input tensor with probability |
|
Randomly zero out entire channels. |
|
Randomly zero out entire channels. |
|
Randomly zero out entire channels. |
|
Applies Alpha Dropout over the input. |
|
Randomly masks out entire channels. |
Sparse Layers (Aliases)#
A simple lookup table that stores embeddings of a fixed dictionary and size. |
|
Compute sums or means of 'bags' of embeddings, without instantiating the intermediate embeddings. |
Distance Functions (Aliases)#
Returns cosine similarity between and , computed along dim. |
|
Computes the pairwise distance between input vectors, or between columns of input matrices. |
Loss Functions (Aliases)#
Creates a criterion that measures the mean absolute error (MAE) between each element in the input and target . |
|
Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input and target . |
|
This criterion computes the cross entropy loss between input logits and target. |
|
The Connectionist Temporal Classification loss. |
|
The negative log likelihood loss. |
|
Negative log likelihood loss with Poisson distribution of target. |
|
Gaussian negative log likelihood loss. |
|
The Kullback-Leibler divergence loss. |
|
Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities: |
|
This loss combines a Sigmoid layer and the BCELoss in one single class. |
|
Creates a criterion that measures the loss given inputs , , two 1D mini-batch or 0D Tensors, and a label 1D mini-batch or 0D Tensor (containing 1 or -1). |
|
Measures the loss given an input tensor and a labels tensor (containing 1 or -1). |
|
Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input (a 2D mini-batch Tensor) and output (which is a 2D Tensor of target class indices). |
|
Creates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise. |
|
Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. |
|
Creates a criterion that optimizes a two-class classification logistic loss between input tensor and target tensor (containing 1 or -1). |
|
Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target of size . |
|
Creates a criterion that measures the loss given input tensors , and a Tensor label with values 1 or -1. |
|
Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input (a 2D mini-batch Tensor) and output (which is a 1D tensor of target class indices, ): |
|
Creates a criterion that measures the triplet loss given an input tensors , , and a margin with a value greater than . |
|
Creates a criterion that measures the triplet loss given input tensors , , and (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function ("distance function") used to compute the relationship between the anchor and positive example ("positive distance") and the anchor and negative example ("negative distance"). |
Vision Layers (Aliases)#
Rearrange elements in a tensor according to an upscaling factor. |
|
Reverse the PixelShuffle operation. |
|
Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. |
|
Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels. |
|
Applies a 2D bilinear upsampling to an input signal composed of several input channels. |
Shuffle Layers (Aliases)#
Divides and rearranges the channels in a tensor. |
torch.nn.utils#
The following are aliases to their counterparts in torch.nn.utils
in nested namespaces.
Utility functions to clip parameter gradients.
Clip the gradient norm of an iterable of parameters. |
|
Clip the gradient norm of an iterable of parameters. |
|
Clip the gradients of an iterable of parameters at specified value. |
Utility functions to flatten and unflatten Module parameters to and from a single vector.
Flatten an iterable of parameters into a single vector. |
|
Copy slices of a vector into an iterable of parameters. |
Utility functions to fuse Modules with BatchNorm modules.
Fuse a convolutional module and a BatchNorm module into a single, new convolutional module. |
|
Fuse convolutional module parameters and BatchNorm module parameters into new convolutional module parameters. |
|
Fuse a linear module and a BatchNorm module into a single, new linear module. |
|
Fuse linear module parameters and BatchNorm module parameters into new linear module parameters. |
Utility functions to convert Module parameter memory formats.
Convert |
|
Convert |
Utility functions to apply and remove weight normalization from Module parameters.
Apply weight normalization to a parameter in the given module. |
|
Remove the weight normalization reparameterization from a module. |
|
Apply spectral normalization to a parameter in the given module. |
|
Remove the spectral normalization reparameterization from a module. |
Utility functions for initializing Module parameters.
Given a module class object and args / kwargs, instantiate the module without initializing parameters / buffers. |