torch.optim =================================== .. automodule:: torch.optim How to use an optimizer ----------------------- To use :mod:`torch.optim` you have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients. Constructing it ^^^^^^^^^^^^^^^ To construct an :class:`Optimizer` you have to give it an iterable containing the parameters (all should be :class:`~torch.autograd.Variable` s) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc. .. note:: If you need to move a model to GPU via ``.cuda()``, please do so before constructing optimizers for it. Parameters of a model after ``.cuda()`` will be different objects with those before the call. In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used. Example:: optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) optimizer = optim.Adam([var1, var2], lr=0.0001) Per-parameter options ^^^^^^^^^^^^^^^^^^^^^ :class:`Optimizer` s also support specifying per-parameter options. To do this, instead of passing an iterable of :class:`~torch.autograd.Variable` s, pass in an iterable of :class:`dict` s. Each of them will define a separate parameter group, and should contain a ``params`` key, containing a list of parameters belonging to it. Other keys should match the keyword arguments accepted by the optimizers, and will be used as optimization options for this group. .. note:: You can still pass options as keyword arguments. They will be used as defaults, in the groups that didn't override them. This is useful when you only want to vary a single option, while keeping all others consistent between parameter groups. For example, this is very useful when one wants to specify per-layer learning rates:: optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9) This means that ``model.base``'s parameters will use the default learning rate of ``1e-2``, ``model.classifier``'s parameters will use a learning rate of ``1e-3``, and a momentum of ``0.9`` will be used for all parameters. Taking an optimization step ^^^^^^^^^^^^^^^^^^^^^^^^^^^ All optimizers implement a :func:`~Optimizer.step` method, that updates the parameters. It can be used in two ways: ``optimizer.step()`` ~~~~~~~~~~~~~~~~~~~~ This is a simplified version supported by most optimizers. The function can be called once the gradients are computed using e.g. :func:`~torch.autograd.Variable.backward`. Example:: for input, target in dataset: optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step() ``optimizer.step(closure)`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some optimization algorithms such as Conjugate Gradient and LBFGS need to reevaluate the function multiple times, so you have to pass in a closure that allows them to recompute your model. The closure should clear the gradients, compute the loss, and return it. Example:: for input, target in dataset: def closure(): optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() return loss optimizer.step(closure) .. _optimizer-algorithms: Algorithms ---------- .. autoclass:: Optimizer :members: .. autoclass:: Adadelta :members: .. autoclass:: Adagrad :members: .. autoclass:: Adam :members: .. autoclass:: AdamW :members: .. autoclass:: SparseAdam :members: .. autoclass:: Adamax :members: .. autoclass:: ASGD :members: .. autoclass:: LBFGS :members: .. autoclass:: RMSprop :members: .. autoclass:: Rprop :members: .. autoclass:: SGD :members: How to adjust Learning Rate --------------------------- :mod:`torch.optim.lr_scheduler` provides several methods to adjust the learning rate based on the number of epochs. :class:`torch.optim.lr_scheduler.ReduceLROnPlateau` allows dynamic learning rate reducing based on some validation measurements. Learning rate scheduling should be applied after optimizer's update; e.g., you should write your code this way: >>> scheduler = ... >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() .. warning:: Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called before the optimizer's update; 1.1.0 changed this behavior in a BC-breaking way. If you use the learning rate scheduler (calling ``scheduler.step()``) before the optimizer's update (calling ``optimizer.step()``), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling ``scheduler.step()`` at the wrong time. .. autoclass:: torch.optim.lr_scheduler.LambdaLR :members: .. autoclass:: torch.optim.lr_scheduler.MultiplicativeLR :members: .. autoclass:: torch.optim.lr_scheduler.StepLR :members: .. autoclass:: torch.optim.lr_scheduler.MultiStepLR :members: .. autoclass:: torch.optim.lr_scheduler.ExponentialLR :members: .. autoclass:: torch.optim.lr_scheduler.CosineAnnealingLR :members: .. autoclass:: torch.optim.lr_scheduler.ReduceLROnPlateau :members: .. autoclass:: torch.optim.lr_scheduler.CyclicLR :members: .. autoclass:: torch.optim.lr_scheduler.OneCycleLR :members: .. autoclass:: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts :members: