--- myst: html_meta: description: PyTorch C++ optimizer API — SGD, Adam, and other optimizers for training neural networks. keywords: PyTorch, C++, optimizer, optim, SGD, Adam, training --- # Optimizers (torch::optim) The `torch::optim` namespace provides optimization algorithms for training neural networks. These optimizers update model parameters based on computed gradients to minimize the loss function. **When to use torch::optim:** - When training neural networks with gradient descent - When you need different optimization strategies (SGD, Adam, etc.) - When implementing learning rate schedules **Basic usage:** ```cpp #include // Create model and optimizer auto model = std::make_shared(); auto optimizer = torch::optim::Adam( model->parameters(), torch::optim::AdamOptions(1e-3)); // Training loop for (auto& batch : *data_loader) { optimizer.zero_grad(); // Clear gradients auto loss = loss_fn(model->forward(batch.data), batch.target); loss.backward(); // Compute gradients optimizer.step(); // Update parameters } ``` ## Header Files - `torch/csrc/api/include/torch/optim.h` - Main optim header - `torch/csrc/api/include/torch/optim/optimizer.h` - Optimizer base class - `torch/csrc/api/include/torch/optim/sgd.h` - SGD optimizer - `torch/csrc/api/include/torch/optim/adam.h` - Adam optimizer ## Optimizer Base Class All optimizers inherit from the `Optimizer` base class, which provides common functionality for parameter updates, gradient zeroing, and state management. ```{doxygenclass} torch::optim::Optimizer :members: :undoc-members: ``` ### OptimizerOptions ```{doxygenclass} torch::optim::OptimizerOptions :members: :undoc-members: ``` ### OptimizerParamGroup ```{doxygenclass} torch::optim::OptimizerParamGroup :members: :undoc-members: ``` ### OptimizerParamState ```{doxygenclass} torch::optim::OptimizerParamState :members: :undoc-members: ``` ## Choosing an Optimizer Selecting the right optimizer depends on your model architecture, dataset, and training requirements: ```{list-table} :widths: 20 40 40 :header-rows: 1 * - Optimizer - Best For - Trade-offs * - **SGD + Momentum** - CNNs, well-understood problems, when you can tune hyperparameters - Requires careful learning rate tuning; often achieves best final accuracy * - **Adam/AdamW** - General-purpose, transformers, quick prototyping - Works well out-of-the-box; AdamW preferred with weight decay * - **RMSprop** - RNNs, non-stationary objectives - Good for recurrent architectures; handles varying gradient scales * - **Adagrad** - Sparse data (NLP, embeddings) - Learning rate decreases over time; good for infrequent features * - **LBFGS** - Small models, fine-tuning, convex problems - Memory-intensive; requires closure function ``` ## Optimizer Categories ```{toctree} :maxdepth: 1 gradient_descent adaptive second_order schedulers ```