.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/getting-started-3.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_getting-started-3.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_getting-started-3.py:


Get started with data collection and storage
============================================

**Author**: `Vincent Moens <https://github.com/vmoens>`_

.. _gs_storage:

.. note:: To run this tutorial in a notebook, add an installation cell
  at the beginning containing:

    .. code-block::

        !pip install tensordict
        !pip install torchrl

.. GENERATED FROM PYTHON SOURCE LINES 20-55

There is no learning without data. In supervised learning, users are
accustomed to using :class:`~torch.utils.data.DataLoader` and the like
to integrate data in their training loop.
Dataloaders are iterable objects that provide you with the data that you will
be using to train your model.

TorchRL approaches the problem of dataloading in a similar manner, although
it is surprisingly unique in the ecosystem of RL libraries. TorchRL's
dataloaders are referred to as ``DataCollectors``. Most of the time,
data collection does not stop at the collection of raw data,
as the data needs to be stored temporarily in a buffer
(or equivalent structure for on-policy sota-implementations) before being consumed
by the :ref:`loss module <gs_optim>`. This tutorial will explore
these two classes.

Data collectors
---------------

.. _gs_storage_collector:


The primary data collector discussed here is the
:class:`~torchrl.collectors.SyncDataCollector`, which is the focus of this
documentation. At a fundamental level, a collector is a straightforward
class responsible for executing your policy within the environment,
resetting the environment when necessary, and providing batches of a
predefined size. Unlike the :meth:`~torchrl.envs.EnvBase.rollout` method
demonstrated in :ref:`the env tutorial <gs_env_ted>`, collectors do not
reset between consecutive batches of data. Consequently, two successive
batches of data may contain elements from the same trajectory.

The basic arguments you need to pass to your collector are the size of the
batches you want to collect (``frames_per_batch``), the length (possibly
infinite) of the iterator, the policy and the environment. For simplicity,
we will use a dummy, random policy in this example.

.. GENERATED FROM PYTHON SOURCE LINES 56-71

.. code-block:: Python


    import torch

    torch.manual_seed(0)

    from torchrl.collectors import SyncDataCollector
    from torchrl.envs import GymEnv
    from torchrl.envs.utils import RandomPolicy

    env = GymEnv("CartPole-v1")
    env.set_seed(0)

    policy = RandomPolicy(env.action_spec)
    collector = SyncDataCollector(env, policy, frames_per_batch=200, total_frames=-1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /pytorch/rl/torchrl/envs/common.py:2989: DeprecationWarning: Your wrapper was not given a device. Currently, this value will default to 'cpu'. From v0.5 it will default to `None`. With a device of None, no device casting is performed and the resulting tensordicts are deviceless. Please set your device accordingly.
      warnings.warn(


.. GENERATED FROM PYTHON SOURCE LINES 72-80

We now expect that our collector will deliver batches of size ``200`` no
matter what happens during collection. In other words, we may have multiple
trajectories in this batch! The ``total_frames`` indicates how long the
collector should be. A value of ``-1`` will produce a never
ending collector.

Let's iterate over the collector to get a sense
of what this data looks like:

.. GENERATED FROM PYTHON SOURCE LINES 80-85

.. code-block:: Python


    for data in collector:
        print(data)
        break


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([200, 2]), device=cpu, dtype=torch.int64, is_shared=False),
            collector: TensorDict(
                fields={
                    traj_ids: Tensor(shape=torch.Size([200]), device=cpu, dtype=torch.int64, is_shared=False)},
                batch_size=torch.Size([200]),
                device=None,
                is_shared=False),
            done: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([200, 4]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([200]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([200, 4]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([200]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 86-91

As you can see, our data is augmented with some collector-specific metadata
grouped in a ``"collector"`` sub-tensordict that we did not see during
:ref:`environment rollouts <gs_env_ted_rollout>`. This is useful to keep track of
the trajectory ids. In the following list, each item marks the trajectory
number the corresponding transition belongs to:

.. GENERATED FROM PYTHON SOURCE LINES 91-94

.. code-block:: Python


    print(data["collector", "traj_ids"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
            3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
            4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
            4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,
            5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
            6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
            7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
            9, 9, 9, 9, 9, 9, 9, 9])


.. GENERATED FROM PYTHON SOURCE LINES 95-137

Data collectors are very useful when it comes to coding state-of-the-art
sota-implementations, as performance is usually measured by the capability of a
specific technique to solve a problem in a given number of interactions with
the environment (the ``total_frames`` argument in the collector).
For this reason, most training loops in our examples look like this:

  >>> for data in collector:
  ...     # your algorithm here


Replay Buffers
--------------

.. _gs_storage_rb:

Now that we have explored how to collect data, we would like to know how to
store it. In RL, the typical setting is that the data is collected, stored
temporarily and cleared after a little while given some heuristic:
first-in first-out or other. A typical pseudo-code would look like this:

  >>> for data in collector:
  ...     storage.store(data)
  ...     for i in range(n_optim):
  ...         sample = storage.sample()
  ...         loss_val = loss_fn(sample)
  ...         loss_val.backward()
  ...         optim.step() # etc

The parent class that stores the data in TorchRL
is referred to as :class:`~torchrl.data.ReplayBuffer`. TorchRL's replay
buffers are composable: you can edit the storage type, their sampling
technique, the writing heuristic or the transforms applied to them. We will
leave the fancy stuff for a dedicated in-depth tutorial. The generic replay
buffer only needs to know what storage it has to use. In general, we
recommend a :class:`~torchrl.data.TensorStorage` subclass, which will work
fine in most cases. We'll be using
:class:`~torchrl.data.replay_buffers.LazyMemmapStorage`
in this tutorial, which enjoys two nice properties: first, being "lazy",
you don't  need to explicitly tell it what your data looks like in advance.
Second, it uses :class:`~tensordict.MemoryMappedTensor` as a backend to save
your data on disk in an efficient way. The only thing you need to know is
how big you want your buffer to be.

.. GENERATED FROM PYTHON SOURCE LINES 137-142

.. code-block:: Python


    from torchrl.data.replay_buffers import LazyMemmapStorage, ReplayBuffer

    buffer = ReplayBuffer(storage=LazyMemmapStorage(max_size=1000))


.. GENERATED FROM PYTHON SOURCE LINES 143-147

Populating the buffer can be done via the
:meth:`~torchrl.data.ReplayBuffer.add` (single element) or
:meth:`~torchrl.data.ReplayBuffer.extend` (multiple elements) methods. Using
the data we just collected, we initialize and populate the buffer in one go:

.. GENERATED FROM PYTHON SOURCE LINES 147-150

.. code-block:: Python


    indices = buffer.extend(data)


.. GENERATED FROM PYTHON SOURCE LINES 151-153

We can check that the buffer now has the same number of elements than what
we got from the collector:

.. GENERATED FROM PYTHON SOURCE LINES 153-156

.. code-block:: Python


    assert len(buffer) == collector.frames_per_batch


.. GENERATED FROM PYTHON SOURCE LINES 157-162

The only thing left to know is how to gather data from the buffer.
Naturally, this relies on the :meth:`~torchrl.data.ReplayBuffer.sample`
method. Because we did not specify that sampling had to be done without
repetitions, it is not guaranteed that the samples gathered from our buffer
will be unique:

.. GENERATED FROM PYTHON SOURCE LINES 162-166

.. code-block:: Python


    sample = buffer.sample(batch_size=30)
    print(sample)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([30, 2]), device=cpu, dtype=torch.int64, is_shared=False),
            collector: TensorDict(
                fields={
                    traj_ids: Tensor(shape=torch.Size([30]), device=cpu, dtype=torch.int64, is_shared=False)},
                batch_size=torch.Size([30]),
                device=cpu,
                is_shared=False),
            done: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([30, 4]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([30]),
                device=cpu,
                is_shared=False),
            observation: Tensor(shape=torch.Size([30, 4]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([30]),
        device=cpu,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 167-189

Again, our sample looks exactly the same as the data we gathered from the
collector!

Next steps
----------

- You can have look at other multirpocessed
  collectors such as :class:`~torchrl.collectors.collectors.MultiSyncDataCollector` or
  :class:`~torchrl.collectors.collectors.MultiaSyncDataCollector`.
- TorchRL also offers distributed collectors if you have multiple nodes to
  use for inference. Check them out in the
  :ref:`API reference <ref_collectors>`.
- Check the dedicated :ref:`Replay Buffer tutorial <rb_tuto>` to know
  more about the options you have when building a buffer, or the
  :ref:`API reference <ref_data>` which covers all the features in
  details. Replay buffers have countless features such as multithreaded
  sampling, prioritized experience replay, and many more...
- We left out the capacity of replay buffers to be iterated over for
  simplicity. Try it out for yourself: build a buffer and indicate its
  batch-size in the constructor, then try to iterate over it. This is
  equivalent to calling ``rb.sample()`` within a loop!


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 18.780 seconds)

**Estimated memory usage:**  28 MB


.. _sphx_glr_download_tutorials_getting-started-3.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: getting-started-3.ipynb <getting-started-3.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: getting-started-3.py <getting-started-3.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_