BCLoss

class torchrl.objectives.BCLoss(*args, **kwargs)[source]

Behavior Cloning Loss Module.

Implements behavior cloning loss for both stochastic and deterministic policies. Minimizes the negative log-likelihood: -E[log π(a_expert | s)] where π is the policy being trained and a_expert are the expert actions from the demonstration dataset.

Works with any actor network that implements get_dist() method, including both stochastic and deterministic policies.

Reference:

“Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments” https://arxiv.org/abs/1910.04281

Parameters:

actor_network (TensorDictModule) – the actor network to be trained.

Keyword Arguments:

reduction (str, optional) – Specifies the reduction to apply to the output: "none" | "mean" | "sum". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed. Default: "mean".

Examples

>>> import torch
>>> from torch import nn
>>> from torchrl.data.tensor_specs import Bounded
>>> from torchrl.modules.tensordict_module.actors import Actor
>>> from torchrl.objectives.bc import BCLoss
>>> from tensordict import TensorDict
>>> n_act, n_obs = 4, 3
>>> spec = Bounded(-torch.ones(n_act), torch.ones(n_act), (n_act,))
>>> module = nn.Linear(n_obs, n_act)
>>> actor = Actor(module=module, spec=spec)
>>> loss = BCLoss(actor)
>>> batch = [2, ]
>>> data = TensorDict({
...     "observation": torch.randn(*batch, n_obs),
...     "action": spec.rand(batch),
... }, batch)
>>> loss(data)
TensorDict(
    fields={
        loss_bc: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([]),
    device=None,
    is_shared=False)

This class is compatible with non-tensordict based modules too and can be used without recurring to any tensordict-related primitive. In this case, the expected keyword arguments are the actor’s in_keys + ["action"]. The return value is a tensor corresponding to the loss.

Examples

>>> import torch
>>> from torch import nn
>>> from torchrl.data.tensor_specs import Bounded
>>> from torchrl.modules.tensordict_module.actors import Actor
>>> from torchrl.objectives.bc import BCLoss
>>> n_act, n_obs = 4, 3
>>> spec = Bounded(-torch.ones(n_act), torch.ones(n_act), (n_act,))
>>> module = nn.Linear(n_obs, n_act)
>>> actor = Actor(module=module, spec=spec)
>>> loss = BCLoss(actor)
>>> _ = loss.select_out_keys("loss_bc")
>>> batch = [2, ]
>>> loss_bc = loss(
...     observation=torch.randn(*batch, n_obs),
...     action=spec.rand(batch))
>>> loss_bc.backward()
default_keys

alias of _AcceptedKeys

forward(tensordict: TensorDictBase = None) TensorDictBase[source]

Compute the behavior cloning loss.

Parameters:

tensordict (TensorDictBase) – input data containing observations and expert actions.

Returns:

TensorDict with key “loss_bc”.

Docs

Lorem ipsum dolor sit amet, consectetur

View Docs

Tutorials

Lorem ipsum dolor sit amet, consectetur

View Tutorials

Resources

Lorem ipsum dolor sit amet, consectetur

View Resources