--- myst: html_meta: description: Learning rate schedulers in PyTorch C++ — StepLR, ExponentialLR, and other LR scheduling policies. keywords: PyTorch, C++, learning rate, scheduler, StepLR, ExponentialLR, LRScheduler --- # Learning Rate Schedulers Learning rate schedulers adjust the learning rate during training, which often improves convergence and final accuracy. Common strategies include: - **Step decay**: Reduce LR by a factor every N epochs - **Exponential decay**: Multiply LR by gamma each epoch - **Cosine annealing**: Smoothly decrease LR following a cosine curve - **Warmup**: Gradually increase LR at the start of training ## LRScheduler Base Class ```{doxygenclass} torch::optim::LRScheduler :members: :undoc-members: ``` ## StepLR Decays the learning rate by `gamma` every `step_size` epochs. This is the simplest and most commonly used scheduler. ```{doxygenclass} torch::optim::StepLR :members: :undoc-members: ``` **Example:** ```cpp auto optimizer = torch::optim::SGD( model->parameters(), torch::optim::SGDOptions(0.1)); // Reduce LR by 10x every 30 epochs auto scheduler = torch::optim::StepLR( optimizer, /*step_size=*/30, /*gamma=*/0.1); for (int epoch = 0; epoch < 90; ++epoch) { train_one_epoch(model, optimizer, data_loader); scheduler.step(); // LR: 0.1 (epochs 0-29), 0.01 (30-59), 0.001 (60-89) } ``` ## ReduceLROnPlateau Reduces the learning rate when a metric has stopped improving. Useful when you want the scheduler to respond to validation loss rather than follow a fixed schedule. ```{doxygenclass} torch::optim::ReduceLROnPlateauScheduler :members: :undoc-members: ``` ## ExponentialLR Decays the learning rate by `gamma` every epoch. Provides smoother decay than StepLR but may be slower to reduce the learning rate. **Example:** ```cpp auto optimizer = torch::optim::Adam( model->parameters(), torch::optim::AdamOptions(1e-3)); // Reduce LR by 5% each epoch auto scheduler = torch::optim::ExponentialLR( optimizer, /*gamma=*/0.95); for (int epoch = 0; epoch < num_epochs; ++epoch) { train_one_epoch(model, optimizer, data_loader); scheduler.step(); } ``` ## Complete Training Example Here's a complete example showing optimizer usage with learning rate scheduling: ```cpp #include struct Net : torch::nn::Module { Net() { fc1 = register_module("fc1", torch::nn::Linear(784, 256)); fc2 = register_module("fc2", torch::nn::Linear(256, 10)); } torch::Tensor forward(torch::Tensor x) { x = torch::relu(fc1->forward(x.view({-1, 784}))); return fc2->forward(x); } torch::nn::Linear fc1{nullptr}, fc2{nullptr}; }; int main() { // Create model auto model = std::make_shared(); // Create optimizer with weight decay auto optimizer = torch::optim::AdamW( model->parameters(), torch::optim::AdamWOptions(1e-3) .weight_decay(0.01)); // Learning rate scheduler auto scheduler = torch::optim::StepLR(optimizer, 10, 0.5); // Loss function auto loss_fn = torch::nn::CrossEntropyLoss(); // Training loop for (int epoch = 0; epoch < 30; ++epoch) { model->train(); double epoch_loss = 0.0; for (auto& batch : *train_loader) { optimizer.zero_grad(); auto output = model->forward(batch.data); auto loss = loss_fn(output, batch.target); loss.backward(); optimizer.step(); epoch_loss += loss.item(); } scheduler.step(); std::cout << "Epoch " << epoch << " Loss: " << epoch_loss << " LR: " << scheduler.get_last_lr()[0] << std::endl; } return 0; } ```