OnlineDTLoss¶
- class torchrl.objectives.OnlineDTLoss(*args, **kwargs)[source]¶
TorchRL implementation of the Online Decision Transformer loss.
Presented in “Online Decision Transformer” <https://arxiv.org/abs/2202.05607>
- Parameters:
actor_network (ProbabilisticActor) – stochastic actor
- Keyword Arguments:
alpha_init (
float, optional) – initial entropy multiplier. Default is 1.0.min_alpha (
float, optional) – min value of alpha. Default is None (no minimum value).max_alpha (
float, optional) – max value of alpha. Default is None (no maximum value).fixed_alpha (bool, optional) – if
True, alpha will be fixed to its initial value. Otherwise, alpha will be optimized to match the ‘target_entropy’ value. Default isFalse.target_entropy (
floator str, optional) – Target entropy for the stochastic policy. Default is “auto”, where target entropy is computed as-prod(n_actions).samples_mc_entropy (int) – number of samples to estimate the entropy
reduction (str, optional) – Specifies the reduction to apply to the output:
"none"|"mean"|"sum"."none": no reduction will be applied,"mean": the sum of the output will be divided by the number of elements in the output,"sum": the output will be summed. Default:"mean".
- default_keys¶
alias of
_AcceptedKeys