--- myst: html_meta: description: Transformer layers in PyTorch C++ — Transformer, TransformerEncoder, TransformerDecoder, and MultiheadAttention. keywords: PyTorch, C++, Transformer, TransformerEncoder, TransformerDecoder, MultiheadAttention, attention --- # Transformer Layers Transformer layers use self-attention mechanisms to process sequences in parallel, enabling efficient training on long sequences. They are the foundation of modern NLP models (BERT, GPT) and increasingly used in vision and other domains. - **Transformer**: Complete encoder-decoder architecture - **TransformerEncoder/Decoder**: Standalone encoder or decoder stacks - **TransformerEncoderLayer/DecoderLayer**: Individual transformer blocks - **MultiheadAttention**: Core attention mechanism used throughout **Key parameters:** - `d_model`: Dimension of the model (embedding dimension) - `nhead`: Number of attention heads - `num_encoder_layers/num_decoder_layers`: Number of stacked layers - `dim_feedforward`: Dimension of feedforward network - `dropout`: Dropout rate for regularization ## Transformer Complete encoder-decoder transformer architecture. ```{doxygenclass} torch::nn::Transformer :members: :undoc-members: ``` ```{doxygenclass} torch::nn::TransformerImpl :members: :undoc-members: ``` **Example:** ```cpp auto transformer = torch::nn::Transformer( torch::nn::TransformerOptions() .d_model(512) .nhead(8) .num_encoder_layers(6) .num_decoder_layers(6) .dim_feedforward(2048) .dropout(0.1)); ``` ## TransformerEncoder Stack of encoder layers for processing source sequences. ```{doxygenclass} torch::nn::TransformerEncoder :members: :undoc-members: ``` ```{doxygenclass} torch::nn::TransformerEncoderImpl :members: :undoc-members: ``` ## TransformerDecoder Stack of decoder layers for generating target sequences. ```{doxygenclass} torch::nn::TransformerDecoder :members: :undoc-members: ``` ```{doxygenclass} torch::nn::TransformerDecoderImpl :members: :undoc-members: ``` ## TransformerEncoderLayer Single encoder layer with self-attention and feedforward network. ```{doxygenclass} torch::nn::TransformerEncoderLayerImpl :members: :undoc-members: ``` ## TransformerDecoderLayer Single decoder layer with self-attention, cross-attention, and feedforward network. ```{doxygenclass} TransformerDecoderLayerImpl :members: :undoc-members: ``` ## MultiheadAttention Scaled dot-product attention with multiple parallel heads. ```{doxygenclass} torch::nn::MultiheadAttention :members: :undoc-members: ``` ```{doxygenclass} torch::nn::MultiheadAttentionImpl :members: :undoc-members: ```