.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/torchrl_demo.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_torchrl_demo.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_torchrl_demo.py:


Introduction to TorchRL
=======================
This demo was presented at ICML 2022 on the industry demo day.

.. GENERATED FROM PYTHON SOURCE LINES 7-186

It gives a good overview of TorchRL functionalities. Feel free to reach out
to vmoens@fb.com or submit issues if you have questions or comments about
it.

TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch.

https://github.com/pytorch/rl

The PyTorch ecosystem team (Meta) has decided to invest in that library to
provide a leading platform to develop RL solutions in research settings.

It provides pytorch and **python-first**, low and high level
**abstractions** # for RL that are intended to be efficient, documented and
properly tested.
The code is aimed at supporting research in RL. Most of it is written in
python in a highly modular way, such that researchers can easily swap
components, transform them or write new ones with little effort.

This repo attempts to align with the existing pytorch ecosystem libraries
in that it has a dataset pillar (torchrl/envs), transforms, models, data
utilities (e.g. collectors and containers), etc. TorchRL aims at having as
few dependencies as possible (python standard library, numpy and pytorch).
Common environment libraries (e.g. OpenAI gym) are only optional.

**Content**:
   .. aafig::

      "torchrl"
      │
      ├── "collectors"
      │   └── "collectors.py"
      │   │
      │   └── "distributed"
      │       └── "default_configs.py"
      │       └── "generic.py"
      │       └── "ray.py"
      │       └── "rpc.py"
      │       └── "sync.py"
      ├── "data"
      │   │
      │   ├── "datasets"
      │   │   └── "atari_dqn.py"
      │   │   └── "d4rl.py"
      │   │   └── "d4rl_infos.py"
      │   │   └── "gen_dgrl.py"
      │   │   └── "minari_data.py"
      │   │   └── "openml.py"
      │   │   └── "openx.py"
      │   │   └── "roboset.py"
      │   │   └── "vd4rl.py"
      │   ├── "postprocs"
      │   │   └── "postprocs.py"
      │   ├── "replay_buffers"
      │   │   └── "replay_buffers.py"
      │   │   └── "samplers.py"
      │   │   └── "storages.py"
      │   │   └── "writers.py"
      │   ├── "rlhf"
      │   │   └── "dataset.py"
      │   │   └── "prompt.py"
      │   │   └── "reward.py"
      │   └── "tensor_specs.py"
      ├── "envs"
      │   └── "batched_envs.py"
      │   └── "common.py"
      │   └── "env_creator.py"
      │   └── "gym_like.py"
      │   ├── "libs"
      │   │   └── "brax.py"
      │   │   └── "dm_control.py"
      │   │   └── "envpool.py"
      │   │   └── "gym.py"
      │   │   └── "habitat.py"
      │   │   └── "isaacgym.py"
      │   │   └── "jumanji.py"
      │   │   └── "openml.py"
      │   │   └── "pettingzoo.py"
      │   │   └── "robohive.py"
      │   │   └── "smacv2.py"
      │   │   └── "vmas.py"
      │   ├── "model_based"
      │   │   └── "common.py"
      │   │   └── "dreamer.py"
      │   ├── "transforms"
      │   │   └── "functional.py"
      │   │   └── "gym_transforms.py"
      │   │   └── "r3m.py"
      │   │   └── "rlhf.py"
      │   │   └── "vc1.py"
      │   │   └── "vip.py"
      │   └── "vec_envs.py"
      ├── "modules"
      │   ├── "distributions"
      │   │   └── "continuous.py"
      │   │   └── "discrete.py"
      │   │   └── "truncated_normal.py"
      │   ├── "models"
      │   │   └── "decision_transformer.py"
      │   │   └── "exploration.py"
      │   │   └── "model_based.py"
      │   │   └── "models.py"
      │   │   └── "multiagent.py"
      │   │   └── "rlhf.py"
      │   ├── "planners"
      │   │   └── "cem.py"
      │   │   └── "common.py"
      │   │   └── "mppi.py"
      │   └── "tensordict_module"
      │       └── "actors.py"
      │       └── "common.py"
      │       └── "exploration.py"
      │       └── "probabilistic.py"
      │       └── "rnn.py"
      │       └── "sequence.py"
      │       └── "world_models.py"
      ├── "objectives"
      │   └── "a2c.py"
      │   └── "common.py"
      │   └── "cql.py"
      │   └── "ddpg.py"
      │   └── "decision_transformer.py"
      │   └── "deprecated.py"
      │   └── "dqn.py"
      │   └── "dreamer.py"
      │   └── "functional.py"
      │   └── "iql.py"
      │   ├── "multiagent"
      │   │   └── "qmixer.py"
      │   └── "ppo.py"
      │   └── "redq.py"
      │   └── "reinforce.py"
      │   └── "sac.py"
      │   └── "td3.py"
      │   ├── "value"
      │       └── "advantages.py"
      │       └── "functional.py"
      │       └── "pg.py"
      ├── "record"
      │   ├── "loggers"
      │   │   └── "common.py"
      │   │   └── "csv.py"
      │   │   └── "mlflow.py"
      │   │   └── "tensorboard.py"
      │   │   └── "wandb.py"
      │   └── "recorder.py"
      ├── "trainers"
      │   │
      │   ├── "helpers"
      │   │   └── "collectors.py"
      │   │   └── "envs.py"
      │   │   └── "logger.py"
      │   │   └── "losses.py"
      │   │   └── "models.py"
      │   │   └── "replay_buffer.py"
      │   │   └── "trainers.py"
      │   └── "trainers.py"
      └── "version.py"

Unlike other domains, RL is less about media than *algorithms*. As such, it
is harder to make truly independent components.

What TorchRL is not:

* a collection of algorithms: we do not intend to provide SOTA implementations of RL algorithms,
  but we provide these algorithms only as examples of how to use the library.

* a research framework: modularity in TorchRL comes in two flavors. First, we try
  to build re-usable components, such that they can be easily swapped with each other.
  Second, we make our best such that components can be used independently of the rest
  of the library.

TorchRL has very few core dependencies, predominantly PyTorch and numpy. All
other dependencies (gym, torchvision, wandb / tensorboard) are optional.

Data
^^^^

TensorDict
----------

.. GENERATED FROM PYTHON SOURCE LINES 186-191

.. code-block:: Python


    import torch
    from tensordict import TensorDict


.. GENERATED FROM PYTHON SOURCE LINES 214-215

Let's create a TensorDict.

.. GENERATED FROM PYTHON SOURCE LINES 215-226

.. code-block:: Python


    batch_size = 5
    tensordict = TensorDict(
        source={
            "key 1": torch.zeros(batch_size, 3),
            "key 2": torch.zeros(batch_size, 5, 6, dtype=torch.bool),
        },
        batch_size=[batch_size],
    )
    print(tensordict)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            key 1: Tensor(shape=torch.Size([5, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            key 2: Tensor(shape=torch.Size([5, 5, 6]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([5]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 227-228

You can index a TensorDict as well as query keys.

.. GENERATED FROM PYTHON SOURCE LINES 228-232

.. code-block:: Python


    print(tensordict[2])
    print(tensordict["key 1"] is tensordict.get("key 1"))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            key 1: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            key 2: Tensor(shape=torch.Size([5, 6]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)
    True


.. GENERATED FROM PYTHON SOURCE LINES 233-234

The following shows how to stack multiple TensorDicts.

.. GENERATED FROM PYTHON SOURCE LINES 234-254

.. code-block:: Python


    tensordict1 = TensorDict(
        source={
            "key 1": torch.zeros(batch_size, 1),
            "key 2": torch.zeros(batch_size, 5, 6, dtype=torch.bool),
        },
        batch_size=[batch_size],
    )

    tensordict2 = TensorDict(
        source={
            "key 1": torch.ones(batch_size, 1),
            "key 2": torch.ones(batch_size, 5, 6, dtype=torch.bool),
        },
        batch_size=[batch_size],
    )

    tensordict = torch.stack([tensordict1, tensordict2], 0)
    tensordict.batch_size, tensordict["key 1"]


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (torch.Size([2, 5]), tensor([[[0.],
             [0.],
             [0.],
             [0.],
             [0.]],

            [[1.],
             [1.],
             [1.],
             [1.],
             [1.]]]))


.. GENERATED FROM PYTHON SOURCE LINES 255-256

Here are some other functionalities of TensorDict.

.. GENERATED FROM PYTHON SOURCE LINES 256-281

.. code-block:: Python


    print(
        "view(-1): ",
        tensordict.view(-1).batch_size,
        tensordict.view(-1).get("key 1").shape,
    )

    print("to device: ", tensordict.to("cpu"))

    # print("pin_memory: ", tensordict.pin_memory())

    print("share memory: ", tensordict.share_memory_())

    print(
        "permute(1, 0): ",
        tensordict.permute(1, 0).batch_size,
        tensordict.permute(1, 0).get("key 1").shape,
    )

    print(
        "expand: ",
        tensordict.expand(3, *tensordict.batch_size).batch_size,
        tensordict.expand(3, *tensordict.batch_size).get("key 1").shape,
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    view(-1):  torch.Size([10]) torch.Size([10, 1])
    to device:  TensorDict(
        fields={
            key 1: Tensor(shape=torch.Size([2, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            key 2: Tensor(shape=torch.Size([2, 5, 5, 6]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([2, 5]),
        device=cpu,
        is_shared=False)
    share memory:  TensorDict(
        fields={
            key 1: Tensor(shape=torch.Size([2, 5, 1]), device=cpu, dtype=torch.float32, is_shared=True),
            key 2: Tensor(shape=torch.Size([2, 5, 5, 6]), device=cpu, dtype=torch.bool, is_shared=True)},
        batch_size=torch.Size([2, 5]),
        device=None,
        is_shared=True)
    permute(1, 0):  torch.Size([5, 2]) torch.Size([5, 2, 1])
    expand:  torch.Size([3, 2, 5]) torch.Size([3, 2, 5, 1])


.. GENERATED FROM PYTHON SOURCE LINES 282-283

You can create a **nested TensorDict** as well.

.. GENERATED FROM PYTHON SOURCE LINES 283-296

.. code-block:: Python


    tensordict = TensorDict(
        source={
            "key 1": torch.zeros(batch_size, 3),
            "key 2": TensorDict(
                source={"sub-key 1": torch.zeros(batch_size, 2, 1)},
                batch_size=[batch_size, 2],
            ),
        },
        batch_size=[batch_size],
    )
    tensordict


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            key 1: Tensor(shape=torch.Size([5, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            key 2: TensorDict(
                fields={
                    sub-key 1: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                batch_size=torch.Size([5, 2]),
                device=None,
                is_shared=False)},
        batch_size=torch.Size([5]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 297-299

Replay buffers
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 299-302

.. code-block:: Python


    from torchrl.data import PrioritizedReplayBuffer, ReplayBuffer


.. GENERATED FROM PYTHON SOURCE LINES 303-308

.. code-block:: Python


    rb = ReplayBuffer(collate_fn=lambda x: x)
    rb.add(1)
    rb.sample(1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    [1]


.. GENERATED FROM PYTHON SOURCE LINES 309-313

.. code-block:: Python


    rb.extend([2, 3])
    rb.sample(3)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    [2, 1, 3]


.. GENERATED FROM PYTHON SOURCE LINES 314-320

.. code-block:: Python


    rb = PrioritizedReplayBuffer(alpha=0.7, beta=1.1, collate_fn=lambda x: x)
    rb.add(1)
    rb.sample(1)
    rb.update_priority(1, 0.5)


.. GENERATED FROM PYTHON SOURCE LINES 321-322

Here are examples of using a replaybuffer with tensordicts.

.. GENERATED FROM PYTHON SOURCE LINES 322-328

.. code-block:: Python


    collate_fn = torch.stack
    rb = ReplayBuffer(collate_fn=collate_fn)
    rb.add(TensorDict({"a": torch.randn(3)}, batch_size=[]))
    len(rb)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    1


.. GENERATED FROM PYTHON SOURCE LINES 329-335

.. code-block:: Python


    rb.extend(TensorDict({"a": torch.randn(2, 3)}, batch_size=[2]))
    print(len(rb))
    print(rb.sample(10))
    print(rb.sample(2).contiguous())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    3
    TensorDict(
        fields={
            a: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([10]),
        device=None,
        is_shared=False)
    TensorDict(
        fields={
            a: Tensor(shape=torch.Size([2, 3]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([2]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 336-345

.. code-block:: Python


    torch.manual_seed(0)
    from torchrl.data import TensorDictPrioritizedReplayBuffer

    rb = TensorDictPrioritizedReplayBuffer(alpha=0.7, beta=1.1, priority_key="td_error")
    rb.extend(TensorDict({"a": torch.randn(2, 3)}, batch_size=[2]))
    tensordict_sample = rb.sample(2).contiguous()
    tensordict_sample


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            _weight: Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.float32, is_shared=False),
            a: Tensor(shape=torch.Size([2, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            index: Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.int64, is_shared=False)},
        batch_size=torch.Size([2]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 346-349

.. code-block:: Python


    tensordict_sample["index"]


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    tensor([1, 0])


.. GENERATED FROM PYTHON SOURCE LINES 350-364

.. code-block:: Python


    tensordict_sample["td_error"] = torch.rand(2)
    rb.update_tensordict_priority(tensordict_sample)

    for i, val in enumerate(rb._sampler._sum_tree):
        print(i, val)
        if i == len(rb):
            break

    try:
        import gymnasium as gym
    except ModuleNotFoundError:
        import gym


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    0 0.28791671991348267
    1 0.06984967738389969
    2 0.0


.. GENERATED FROM PYTHON SOURCE LINES 365-367

Envs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 367-374

.. code-block:: Python


    from torchrl.envs.libs.gym import GymEnv, GymWrapper, set_gym_backend

    gym_env = gym.make("Pendulum-v1")
    env = GymWrapper(gym_env)
    env = GymEnv("Pendulum-v1")


.. GENERATED FROM PYTHON SOURCE LINES 375-379

.. code-block:: Python


    tensordict = env.reset()
    env.rand_step(tensordict)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 380-382

Changing environments config
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 382-386

.. code-block:: Python


    env = GymEnv("Pendulum-v1", frame_skip=3, from_pixels=True, pixels_only=False)
    env.reset()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            pixels: Tensor(shape=torch.Size([500, 500, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 387-391

.. code-block:: Python


    env.close()
    del env


.. GENERATED FROM PYTHON SOURCE LINES 392-405

.. code-block:: Python


    from torchrl.envs import (
        Compose,
        NoopResetEnv,
        ObservationNorm,
        ToTensorImage,
        TransformedEnv,
    )

    base_env = GymEnv("Pendulum-v1", frame_skip=3, from_pixels=True, pixels_only=False)
    env = TransformedEnv(base_env, Compose(NoopResetEnv(3), ToTensorImage()))
    env.append_transform(ObservationNorm(in_keys=["pixels"], loc=2, scale=1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TransformedEnv(
        env=GymEnv(env=Pendulum-v1, batch_size=torch.Size([]), device=None),
        transform=Compose(
                NoopResetEnv(noops=3, random=True),
                ToTensorImage(keys=['pixels']),
                ObservationNorm(loc=2.0000, scale=1.0000, keys=['pixels'])))


.. GENERATED FROM PYTHON SOURCE LINES 406-408

Transforms
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 408-422

.. code-block:: Python


    from torchrl.envs import (
        Compose,
        NoopResetEnv,
        ObservationNorm,
        StepCounter,
        ToTensorImage,
        TransformedEnv,
    )

    base_env = GymEnv("Pendulum-v1", frame_skip=3, from_pixels=True, pixels_only=False)
    env = TransformedEnv(base_env, Compose(NoopResetEnv(3), ToTensorImage()))
    env.append_transform(ObservationNorm(in_keys=["pixels"], loc=2, scale=1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TransformedEnv(
        env=GymEnv(env=Pendulum-v1, batch_size=torch.Size([]), device=None),
        transform=Compose(
                NoopResetEnv(noops=3, random=True),
                ToTensorImage(keys=['pixels']),
                ObservationNorm(loc=2.0000, scale=1.0000, keys=['pixels'])))


.. GENERATED FROM PYTHON SOURCE LINES 423-426

.. code-block:: Python


    env.reset()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            pixels: Tensor(shape=torch.Size([3, 500, 500]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 427-431

.. code-block:: Python


    print("env: ", env)
    print("last transform parent: ", env.transform[2].parent)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    env:  TransformedEnv(
        env=GymEnv(env=Pendulum-v1, batch_size=torch.Size([]), device=None),
        transform=Compose(
                NoopResetEnv(noops=3, random=True),
                ToTensorImage(keys=['pixels']),
                ObservationNorm(loc=2.0000, scale=1.0000, keys=['pixels'])))
    last transform parent:  TransformedEnv(
        env=GymEnv(env=Pendulum-v1, batch_size=torch.Size([]), device=None),
        transform=Compose(
                NoopResetEnv(noops=3, random=True),
                ToTensorImage(keys=['pixels'])))


.. GENERATED FROM PYTHON SOURCE LINES 432-434

Vectorized Environments
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 434-455

.. code-block:: Python


    from torchrl.envs import ParallelEnv


    def make_env():
        # You can control whether to use gym or gymnasium for your env
        with set_gym_backend("gym"):
            return GymEnv("Pendulum-v1", frame_skip=3, from_pixels=True, pixels_only=False)


    base_env = ParallelEnv(
        4,
        make_env,
        mp_start_method="fork",  # This will break on Windows machines! Remove and decorate with if __name__ == "__main__"
    )
    env = TransformedEnv(
        base_env, Compose(StepCounter(), ToTensorImage())
    )  # applies transforms on batch of envs
    env.append_transform(ObservationNorm(in_keys=["pixels"], loc=2, scale=1))
    env.reset()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            done: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            observation: Tensor(shape=torch.Size([4, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            pixels: Tensor(shape=torch.Size([4, 3, 500, 500]), device=cpu, dtype=torch.float32, is_shared=False),
            step_count: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            terminated: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([4]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 456-462

.. code-block:: Python


    print(env.action_spec)

    env.close()
    del env


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    BoundedContinuous(
        shape=torch.Size([4, 1]),
        space=ContinuousBox(
            low=Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, contiguous=True),
            high=Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, contiguous=True)),
        device=cpu,
        dtype=torch.float32,
        domain=continuous)


.. GENERATED FROM PYTHON SOURCE LINES 463-470

Modules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Models
------------------------------

Example of a MLP model:

.. GENERATED FROM PYTHON SOURCE LINES 470-473

.. code-block:: Python


    from torch import nn


.. GENERATED FROM PYTHON SOURCE LINES 474-482

.. code-block:: Python


    from torchrl.modules import ConvNet, MLP
    from torchrl.modules.models.utils import SquashDims

    net = MLP(num_cells=[32, 64], out_features=4, activation_class=nn.ELU)
    print(net)
    print(net(torch.randn(10, 3)).shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    MLP(
      (0): LazyLinear(in_features=0, out_features=32, bias=True)
      (1): ELU(alpha=1.0)
      (2): Linear(in_features=32, out_features=64, bias=True)
      (3): ELU(alpha=1.0)
      (4): Linear(in_features=64, out_features=4, bias=True)
    )
    torch.Size([10, 4])


.. GENERATED FROM PYTHON SOURCE LINES 483-484

Example of a CNN model:

.. GENERATED FROM PYTHON SOURCE LINES 484-494

.. code-block:: Python


    cnn = ConvNet(
        num_cells=[32, 64],
        kernel_sizes=[8, 4],
        strides=[2, 1],
        aggregator_class=SquashDims,
    )
    print(cnn)
    print(cnn(torch.randn(10, 3, 32, 32)).shape)  # last tensor is squashed


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ConvNet(
      (0): LazyConv2d(0, 32, kernel_size=(8, 8), stride=(2, 2))
      (1): ELU(alpha=1.0)
      (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(1, 1))
      (3): ELU(alpha=1.0)
      (4): SquashDims()
    )
    torch.Size([10, 6400])


.. GENERATED FROM PYTHON SOURCE LINES 495-497

TensorDictModules
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 497-506

.. code-block:: Python


    from tensordict.nn import TensorDictModule

    tensordict = TensorDict({"key 1": torch.randn(10, 3)}, batch_size=[10])
    module = nn.Linear(3, 4)
    td_module = TensorDictModule(module, in_keys=["key 1"], out_keys=["key 2"])
    td_module(tensordict)
    print(tensordict)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            key 1: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            key 2: Tensor(shape=torch.Size([10, 4]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([10]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 507-509

Sequences of Modules
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 509-524

.. code-block:: Python


    from tensordict.nn import TensorDictSequential

    backbone_module = nn.Linear(5, 3)
    backbone = TensorDictModule(
        backbone_module, in_keys=["observation"], out_keys=["hidden"]
    )
    actor_module = nn.Linear(3, 4)
    actor = TensorDictModule(actor_module, in_keys=["hidden"], out_keys=["action"])
    value_module = MLP(out_features=1, num_cells=[4, 5])
    value = TensorDictModule(value_module, in_keys=["hidden", "action"], out_keys=["value"])

    sequence = TensorDictSequential(backbone, actor, value)
    print(sequence)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDictSequential(
        module=ModuleList(
          (0): TensorDictModule(
              module=Linear(in_features=5, out_features=3, bias=True),
              device=cpu,
              in_keys=['observation'],
              out_keys=['hidden'])
          (1): TensorDictModule(
              module=Linear(in_features=3, out_features=4, bias=True),
              device=cpu,
              in_keys=['hidden'],
              out_keys=['action'])
          (2): TensorDictModule(
              module=MLP(
                (0): LazyLinear(in_features=0, out_features=4, bias=True)
                (1): Tanh()
                (2): Linear(in_features=4, out_features=5, bias=True)
                (3): Tanh()
                (4): Linear(in_features=5, out_features=1, bias=True)
              ),
              device=cpu,
              in_keys=['hidden', 'action'],
              out_keys=['value'])
        ),
        device=cpu,
        in_keys=['observation'],
        out_keys=['hidden', 'action', 'value'])


.. GENERATED FROM PYTHON SOURCE LINES 525-528

.. code-block:: Python


    print(sequence.in_keys, sequence.out_keys)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ['observation'] ['hidden', 'action', 'value']


.. GENERATED FROM PYTHON SOURCE LINES 529-538

.. code-block:: Python


    tensordict = TensorDict(
        {"observation": torch.randn(3, 5)},
        [3],
    )
    backbone(tensordict)
    actor(tensordict)
    value(tensordict)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False),
            hidden: Tensor(shape=torch.Size([3, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            observation: Tensor(shape=torch.Size([3, 5]), device=cpu, dtype=torch.float32, is_shared=False),
            value: Tensor(shape=torch.Size([3, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([3]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 539-547

.. code-block:: Python


    tensordict = TensorDict(
        {"observation": torch.randn(3, 5)},
        [3],
    )
    sequence(tensordict)
    print(tensordict)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False),
            hidden: Tensor(shape=torch.Size([3, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            observation: Tensor(shape=torch.Size([3, 5]), device=cpu, dtype=torch.float32, is_shared=False),
            value: Tensor(shape=torch.Size([3, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([3]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 548-550

Functional Programming (Ensembling / Meta-RL)
----------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 550-556

.. code-block:: Python


    from tensordict import TensorDict

    params = TensorDict.from_module(sequence)
    print("extracted params", params)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    extracted params TensorDict(
        fields={
            module: TensorDict(
                fields={
                    0: TensorDict(
                        fields={
                            module: TensorDict(
                                fields={
                                    bias: Parameter(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
                                    weight: Parameter(shape=torch.Size([3, 5]), device=cpu, dtype=torch.float32, is_shared=False)},
                                batch_size=torch.Size([]),
                                device=None,
                                is_shared=False)},
                        batch_size=torch.Size([]),
                        device=None,
                        is_shared=False),
                    1: TensorDict(
                        fields={
                            module: TensorDict(
                                fields={
                                    bias: Parameter(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False),
                                    weight: Parameter(shape=torch.Size([4, 3]), device=cpu, dtype=torch.float32, is_shared=False)},
                                batch_size=torch.Size([]),
                                device=None,
                                is_shared=False)},
                        batch_size=torch.Size([]),
                        device=None,
                        is_shared=False),
                    2: TensorDict(
                        fields={
                            module: TensorDict(
                                fields={
                                    0: TensorDict(
                                        fields={
                                            bias: Parameter(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False),
                                            weight: Parameter(shape=torch.Size([4, 7]), device=cpu, dtype=torch.float32, is_shared=False)},
                                        batch_size=torch.Size([]),
                                        device=None,
                                        is_shared=False),
                                    2: TensorDict(
                                        fields={
                                            bias: Parameter(shape=torch.Size([5]), device=cpu, dtype=torch.float32, is_shared=False),
                                            weight: Parameter(shape=torch.Size([5, 4]), device=cpu, dtype=torch.float32, is_shared=False)},
                                        batch_size=torch.Size([]),
                                        device=None,
                                        is_shared=False),
                                    4: TensorDict(
                                        fields={
                                            bias: Parameter(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
                                            weight: Parameter(shape=torch.Size([1, 5]), device=cpu, dtype=torch.float32, is_shared=False)},
                                        batch_size=torch.Size([]),
                                        device=None,
                                        is_shared=False)},
                                batch_size=torch.Size([]),
                                device=None,
                                is_shared=False)},
                        batch_size=torch.Size([]),
                        device=None,
                        is_shared=False)},
                batch_size=torch.Size([]),
                device=None,
                is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 557-558

functional call using tensordict:

.. GENERATED FROM PYTHON SOURCE LINES 558-562

.. code-block:: Python


    with params.to_module(sequence):
        sequence(tensordict)


.. GENERATED FROM PYTHON SOURCE LINES 563-564

Using vectorized map for model ensembling

.. GENERATED FROM PYTHON SOURCE LINES 564-577

.. code-block:: Python

    from torch import vmap

    params_expand = params.expand(4)


    def exec_sequence(params, data):
        with params.to_module(sequence):
            return sequence(data)


    tensordict_exp = vmap(exec_sequence, (0, None))(params_expand, tensordict)
    print(tensordict_exp)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([4, 3, 4]), device=cpu, dtype=torch.float32, is_shared=False),
            hidden: Tensor(shape=torch.Size([4, 3, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            observation: Tensor(shape=torch.Size([4, 3, 5]), device=cpu, dtype=torch.float32, is_shared=False),
            value: Tensor(shape=torch.Size([4, 3, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([4, 3]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 578-580

Specialized Classes
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 580-593

.. code-block:: Python


    torch.manual_seed(0)
    from torchrl.data import Bounded
    from torchrl.modules import SafeModule

    spec = Bounded(-torch.ones(3), torch.ones(3))
    base_module = nn.Linear(5, 3)
    module = SafeModule(
        module=base_module, spec=spec, in_keys=["obs"], out_keys=["action"], safe=True
    )
    tensordict = TensorDict({"obs": torch.randn(5)}, batch_size=[])
    module(tensordict)["action"]


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    tensor([-0.0137,  0.1524, -0.0641], grad_fn=<ViewBackward0>)


.. GENERATED FROM PYTHON SOURCE LINES 594-598

.. code-block:: Python


    tensordict = TensorDict({"obs": torch.randn(5) * 100}, batch_size=[])
    module(tensordict)["action"]  # safe=True projects the result within the set


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    tensor([-1.,  1., -1.], grad_fn=<AsStridedBackward0>)


.. GENERATED FROM PYTHON SOURCE LINES 599-612

.. code-block:: Python


    from torchrl.modules import Actor

    base_module = nn.Linear(5, 3)
    actor = Actor(base_module, in_keys=["obs"])
    tensordict = TensorDict({"obs": torch.randn(5)}, batch_size=[])
    actor(tensordict)  # action is the default value

    from tensordict.nn import (
        ProbabilisticTensorDictModule,
        ProbabilisticTensorDictSequential,
    )


.. GENERATED FROM PYTHON SOURCE LINES 613-634

.. code-block:: Python


    # Probabilistic modules
    from torchrl.modules import NormalParamExtractor, TanhNormal

    td = TensorDict({"input": torch.randn(3, 5)}, [3])
    net = nn.Sequential(
        nn.Linear(5, 4), NormalParamExtractor()
    )  # splits the output in loc and scale
    module = TensorDictModule(net, in_keys=["input"], out_keys=["loc", "scale"])
    td_module = ProbabilisticTensorDictSequential(
        module,
        ProbabilisticTensorDictModule(
            in_keys=["loc", "scale"],
            out_keys=["action"],
            distribution_class=TanhNormal,
            return_log_prob=False,
        ),
    )
    td_module(td)
    print(td)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([3, 2]), device=cpu, dtype=torch.float32, is_shared=False),
            input: Tensor(shape=torch.Size([3, 5]), device=cpu, dtype=torch.float32, is_shared=False),
            loc: Tensor(shape=torch.Size([3, 2]), device=cpu, dtype=torch.float32, is_shared=False),
            scale: Tensor(shape=torch.Size([3, 2]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([3]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 635-650

.. code-block:: Python


    # returning the log-probability
    td = TensorDict({"input": torch.randn(3, 5)}, [3])
    td_module = ProbabilisticTensorDictSequential(
        module,
        ProbabilisticTensorDictModule(
            in_keys=["loc", "scale"],
            out_keys=["action"],
            distribution_class=TanhNormal,
            return_log_prob=True,
        ),
    )
    td_module(td)
    print(td)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([3, 2]), device=cpu, dtype=torch.float32, is_shared=False),
            input: Tensor(shape=torch.Size([3, 5]), device=cpu, dtype=torch.float32, is_shared=False),
            loc: Tensor(shape=torch.Size([3, 2]), device=cpu, dtype=torch.float32, is_shared=False),
            sample_log_prob: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            scale: Tensor(shape=torch.Size([3, 2]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([3]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 651-666

.. code-block:: Python


    # Sampling vs mode / mean
    from torchrl.envs.utils import ExplorationType, set_exploration_type

    td = TensorDict({"input": torch.randn(3, 5)}, [3])

    torch.manual_seed(0)
    with set_exploration_type(ExplorationType.RANDOM):
        td_module(td)
        print("random:", td["action"])

    with set_exploration_type(ExplorationType.DETERMINISTIC):
        td_module(td)
        print("mode:", td["action"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    random: tensor([[ 0.8728, -0.1334],
            [-0.9833,  0.3494],
            [-0.6887, -0.6402]], grad_fn=<_SafeTanhNoEpsBackward>)
    mode: tensor([[-0.1132,  0.1762],
            [-0.3430, -0.2668],
            [ 0.2918,  0.6239]], grad_fn=<_SafeTanhNoEpsBackward>)


.. GENERATED FROM PYTHON SOURCE LINES 667-669

Using Environments and Modules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 669-697

.. code-block:: Python


    from torchrl.envs.utils import step_mdp

    env = GymEnv("Pendulum-v1")

    action_spec = env.action_spec
    actor_module = nn.Linear(3, 1)
    actor = SafeModule(
        actor_module, spec=action_spec, in_keys=["observation"], out_keys=["action"]
    )

    torch.manual_seed(0)
    env.set_seed(0)

    max_steps = 100
    tensordict = env.reset()
    tensordicts = TensorDict({}, [max_steps])
    for i in range(max_steps):
        actor(tensordict)
        tensordicts[i] = env.step(tensordict)
        if tensordict["done"].any():
            break
        tensordict = step_mdp(tensordict)  # roughly equivalent to obs = next_obs

    tensordicts_prealloc = tensordicts.clone()
    print("total steps:", i)
    print(tensordicts)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    total steps: 99
    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([100, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([100]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([100, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([100]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 698-716

.. code-block:: Python


    # equivalent
    torch.manual_seed(0)
    env.set_seed(0)

    max_steps = 100
    tensordict = env.reset()
    tensordicts = []
    for _ in range(max_steps):
        actor(tensordict)
        tensordicts.append(env.step(tensordict))
        if tensordict["done"].any():
            break
        tensordict = step_mdp(tensordict)  # roughly equivalent to obs = next_obs
    tensordicts_stack = torch.stack(tensordicts, 0)
    print("total steps:", i)
    print(tensordicts_stack)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    total steps: 99
    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([100, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([100]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([100, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([100]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 717-720

.. code-block:: Python


    (tensordicts_stack == tensordicts_prealloc).all()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    True


.. GENERATED FROM PYTHON SOURCE LINES 721-732

.. code-block:: Python


    torch.manual_seed(0)
    env.set_seed(0)
    tensordict_rollout = env.rollout(policy=actor, max_steps=max_steps)
    tensordict_rollout


    (tensordict_rollout == tensordicts_prealloc).all()

    from tensordict.nn import TensorDictModule


.. GENERATED FROM PYTHON SOURCE LINES 733-735

Collectors
^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 735-741

.. code-block:: Python


    from torchrl.collectors import MultiaSyncDataCollector, MultiSyncDataCollector

    from torchrl.envs import EnvCreator, SerialEnv
    from torchrl.envs.libs.gym import GymEnv


.. GENERATED FROM PYTHON SOURCE LINES 742-744

EnvCreator makes sure that we can send a lambda function from process to process
We use a SerialEnv for simplicity, but for larger jobs a ParallelEnv would be better suited.

.. GENERATED FROM PYTHON SOURCE LINES 744-754

.. code-block:: Python


    parallel_env = SerialEnv(
        3,
        EnvCreator(lambda: GymEnv("Pendulum-v1")),
    )
    create_env_fn = [parallel_env, parallel_env]

    actor_module = nn.Linear(3, 1)
    actor = TensorDictModule(actor_module, in_keys=["observation"], out_keys=["action"])


.. GENERATED FROM PYTHON SOURCE LINES 755-756

Sync data collector

.. GENERATED FROM PYTHON SOURCE LINES 756-768

.. code-block:: Python


    devices = ["cpu", "cpu"]

    collector = MultiSyncDataCollector(
        create_env_fn=create_env_fn,  # either a list of functions or a ParallelEnv
        policy=actor,
        total_frames=240,
        max_frames_per_traj=-1,  # envs are terminating, we don't need to stop them early
        frames_per_batch=60,  # we want 60 frames at a time (we have 3 envs per sub-collector)
        device=devices,
    )


.. GENERATED FROM PYTHON SOURCE LINES 769-778

.. code-block:: Python


    for i, d in enumerate(collector):
        if i == 0:
            print(d)  # trajectories are split automatically in [6 workers x 10 steps]
        collector.update_policy_weights_()  # make sure that our policies have the latest weights if working on multiple devices
    print(i)
    collector.shutdown()
    del collector


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            collector: TensorDict(
                fields={
                    traj_ids: Tensor(shape=torch.Size([2, 3, 10]), device=cpu, dtype=torch.int64, is_shared=False)},
                batch_size=torch.Size([2, 3, 10]),
                device=cpu,
                is_shared=False),
            done: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([2, 3, 10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([2, 3, 10]),
                device=cpu,
                is_shared=False),
            observation: Tensor(shape=torch.Size([2, 3, 10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([2, 3, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([2, 3, 10]),
        device=cpu,
        is_shared=False)
    3


.. GENERATED FROM PYTHON SOURCE LINES 779-800

.. code-block:: Python


    # async data collector: keeps working while you update your model
    collector = MultiaSyncDataCollector(
        create_env_fn=create_env_fn,  # either a list of functions or a ParallelEnv
        policy=actor,
        total_frames=240,
        max_frames_per_traj=-1,  # envs are terminating, we don't need to stop them early
        frames_per_batch=60,  # we want 60 frames at a time (we have 3 envs per sub-collector)
        device=devices,
    )

    for i, d in enumerate(collector):
        if i == 0:
            print(d)  # trajectories are split automatically in [6 workers x 10 steps]
        collector.update_policy_weights_()  # make sure that our policies have the latest weights if working on multiple devices
    print(i)
    collector.shutdown()
    del collector
    del create_env_fn
    del parallel_env


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            collector: TensorDict(
                fields={
                    traj_ids: Tensor(shape=torch.Size([3, 20]), device=cpu, dtype=torch.int64, is_shared=False)},
                batch_size=torch.Size([3, 20]),
                device=cpu,
                is_shared=False),
            done: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([3, 20, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([3, 20]),
                device=cpu,
                is_shared=False),
            observation: Tensor(shape=torch.Size([3, 20, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([3, 20]),
        device=cpu,
        is_shared=False)
    3


.. GENERATED FROM PYTHON SOURCE LINES 801-803

Objectives
^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 803-825

.. code-block:: Python


    # TorchRL delivers meta-RL compatible loss functions
    # Disclaimer: This APi may change in the future
    from torchrl.objectives import DDPGLoss

    actor_module = nn.Linear(3, 1)
    actor = TensorDictModule(actor_module, in_keys=["observation"], out_keys=["action"])


    class ConcatModule(nn.Linear):
        def forward(self, obs, action):
            return super().forward(torch.cat([obs, action], -1))


    value_module = ConcatModule(4, 1)
    value = TensorDictModule(
        value_module, in_keys=["observation", "action"], out_keys=["state_action_value"]
    )

    loss_fn = DDPGLoss(actor, value)
    loss_fn.make_value_estimator(loss_fn.default_value_estimator, gamma=0.99)


.. GENERATED FROM PYTHON SOURCE LINES 826-842

.. code-block:: Python


    tensordict = TensorDict(
        {
            "observation": torch.randn(10, 3),
            "next": {
                "observation": torch.randn(10, 3),
                "reward": torch.randn(10, 1),
                "done": torch.zeros(10, 1, dtype=torch.bool),
            },
            "action": torch.randn(10, 1),
        },
        batch_size=[10],
        device="cpu",
    )
    loss_td = loss_fn(tensordict)


.. GENERATED FROM PYTHON SOURCE LINES 843-846

.. code-block:: Python


    print(loss_td)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            loss_actor: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, is_shared=False),
            loss_value: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, is_shared=False),
            pred_value: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False),
            pred_value_max: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, is_shared=False),
            target_value: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False),
            target_value_max: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, is_shared=False),
            td_error: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 847-850

.. code-block:: Python


    print(tensordict)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                batch_size=torch.Size([10]),
                device=cpu,
                is_shared=False),
            observation: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            td_error: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)},
        batch_size=torch.Size([10]),
        device=cpu,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 851-870

State of the Library
^^^^^^^^^^^^^^^^^^^^

TorchRL is currently an **alpha-release**: there may be bugs and there is no
guarantee about BC-breaking changes. We should be able to move to a beta-release
by the end of the year. Our roadmap to get there comprises:

- Distributed solutions
- Offline RL
- Greater support for meta-RL
- Multi-task and hierarchical RL

Contributing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We are actively looking for contributors and early users. If you're working in
RL (or just curious), try it! Give us feedback: what will make the success of
TorchRL is how well it covers researchers needs. To do that, we need their input!
Since the library is nascent, it is a great time for you to shape it the way you want!

.. GENERATED FROM PYTHON SOURCE LINES 872-876

Installing the Library
^^^^^^^^^^^^^^^^^^^^^^

The library is on PyPI: *pip install torchrl*


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (3 minutes 42.979 seconds)

**Estimated memory usage:**  328 MB


.. _sphx_glr_download_tutorials_torchrl_demo.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: torchrl_demo.ipynb <torchrl_demo.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: torchrl_demo.py <torchrl_demo.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: torchrl_demo.zip <torchrl_demo.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_