.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/getting-started-3.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_getting-started-3.py: Get started with data collection and storage ============================================ **Author**: `Vincent Moens `_ .. _gs_storage: .. note:: To run this tutorial in a notebook, add an installation cell at the beginning containing: .. code-block:: !pip install tensordict !pip install torchrl .. GENERATED FROM PYTHON SOURCE LINES 20-55 There is no learning without data. In supervised learning, users are accustomed to using :class:`~torch.utils.data.DataLoader` and the like to integrate data in their training loop. Dataloaders are iterable objects that provide you with the data that you will be using to train your model. TorchRL approaches the problem of dataloading in a similar manner, although it is surprisingly unique in the ecosystem of RL libraries. TorchRL's dataloaders are referred to as ``DataCollectors``. Most of the time, data collection does not stop at the collection of raw data, as the data needs to be stored temporarily in a buffer (or equivalent structure for on-policy sota-implementations) before being consumed by the :ref:`loss module `. This tutorial will explore these two classes. Data collectors --------------- .. _gs_storage_collector: The primary data collector discussed here is the :class:`~torchrl.collectors.SyncDataCollector`, which is the focus of this documentation. At a fundamental level, a collector is a straightforward class responsible for executing your policy within the environment, resetting the environment when necessary, and providing batches of a predefined size. Unlike the :meth:`~torchrl.envs.EnvBase.rollout` method demonstrated in :ref:`the env tutorial `, collectors do not reset between consecutive batches of data. Consequently, two successive batches of data may contain elements from the same trajectory. The basic arguments you need to pass to your collector are the size of the batches you want to collect (``frames_per_batch``), the length (possibly infinite) of the iterator, the policy and the environment. For simplicity, we will use a dummy, random policy in this example. .. GENERATED FROM PYTHON SOURCE LINES 56-71 .. code-block:: Python import torch torch.manual_seed(0) from torchrl.collectors import SyncDataCollector from torchrl.envs import GymEnv from torchrl.envs.utils import RandomPolicy env = GymEnv("CartPole-v1") env.set_seed(0) policy = RandomPolicy(env.action_spec) collector = SyncDataCollector(env, policy, frames_per_batch=200, total_frames=-1) .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/rl/torchrl/envs/common.py:2989: DeprecationWarning: Your wrapper was not given a device. Currently, this value will default to 'cpu'. From v0.5 it will default to `None`. With a device of None, no device casting is performed and the resulting tensordicts are deviceless. Please set your device accordingly. warnings.warn( .. GENERATED FROM PYTHON SOURCE LINES 72-80 We now expect that our collector will deliver batches of size ``200`` no matter what happens during collection. In other words, we may have multiple trajectories in this batch! The ``total_frames`` indicates how long the collector should be. A value of ``-1`` will produce a never ending collector. Let's iterate over the collector to get a sense of what this data looks like: .. GENERATED FROM PYTHON SOURCE LINES 80-85 .. code-block:: Python for data in collector: print(data) break .. rst-class:: sphx-glr-script-out .. code-block:: none TensorDict( fields={ action: Tensor(shape=torch.Size([200, 2]), device=cpu, dtype=torch.int64, is_shared=False), collector: TensorDict( fields={ traj_ids: Tensor(shape=torch.Size([200]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([200]), device=None, is_shared=False), done: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False), observation: Tensor(shape=torch.Size([200, 4]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([200]), device=None, is_shared=False), observation: Tensor(shape=torch.Size([200, 4]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([200, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([200]), device=None, is_shared=False) .. GENERATED FROM PYTHON SOURCE LINES 86-91 As you can see, our data is augmented with some collector-specific metadata grouped in a ``"collector"`` sub-tensordict that we did not see during :ref:`environment rollouts `. This is useful to keep track of the trajectory ids. In the following list, each item marks the trajectory number the corresponding transition belongs to: .. GENERATED FROM PYTHON SOURCE LINES 91-94 .. code-block:: Python print(data["collector", "traj_ids"]) .. rst-class:: sphx-glr-script-out .. code-block:: none tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]) .. GENERATED FROM PYTHON SOURCE LINES 95-137 Data collectors are very useful when it comes to coding state-of-the-art sota-implementations, as performance is usually measured by the capability of a specific technique to solve a problem in a given number of interactions with the environment (the ``total_frames`` argument in the collector). For this reason, most training loops in our examples look like this: >>> for data in collector: ... # your algorithm here Replay Buffers -------------- .. _gs_storage_rb: Now that we have explored how to collect data, we would like to know how to store it. In RL, the typical setting is that the data is collected, stored temporarily and cleared after a little while given some heuristic: first-in first-out or other. A typical pseudo-code would look like this: >>> for data in collector: ... storage.store(data) ... for i in range(n_optim): ... sample = storage.sample() ... loss_val = loss_fn(sample) ... loss_val.backward() ... optim.step() # etc The parent class that stores the data in TorchRL is referred to as :class:`~torchrl.data.ReplayBuffer`. TorchRL's replay buffers are composable: you can edit the storage type, their sampling technique, the writing heuristic or the transforms applied to them. We will leave the fancy stuff for a dedicated in-depth tutorial. The generic replay buffer only needs to know what storage it has to use. In general, we recommend a :class:`~torchrl.data.TensorStorage` subclass, which will work fine in most cases. We'll be using :class:`~torchrl.data.replay_buffers.LazyMemmapStorage` in this tutorial, which enjoys two nice properties: first, being "lazy", you don't need to explicitly tell it what your data looks like in advance. Second, it uses :class:`~tensordict.MemoryMappedTensor` as a backend to save your data on disk in an efficient way. The only thing you need to know is how big you want your buffer to be. .. GENERATED FROM PYTHON SOURCE LINES 137-142 .. code-block:: Python from torchrl.data.replay_buffers import LazyMemmapStorage, ReplayBuffer buffer = ReplayBuffer(storage=LazyMemmapStorage(max_size=1000)) .. GENERATED FROM PYTHON SOURCE LINES 143-147 Populating the buffer can be done via the :meth:`~torchrl.data.ReplayBuffer.add` (single element) or :meth:`~torchrl.data.ReplayBuffer.extend` (multiple elements) methods. Using the data we just collected, we initialize and populate the buffer in one go: .. GENERATED FROM PYTHON SOURCE LINES 147-150 .. code-block:: Python indices = buffer.extend(data) .. GENERATED FROM PYTHON SOURCE LINES 151-153 We can check that the buffer now has the same number of elements than what we got from the collector: .. GENERATED FROM PYTHON SOURCE LINES 153-156 .. code-block:: Python assert len(buffer) == collector.frames_per_batch .. GENERATED FROM PYTHON SOURCE LINES 157-162 The only thing left to know is how to gather data from the buffer. Naturally, this relies on the :meth:`~torchrl.data.ReplayBuffer.sample` method. Because we did not specify that sampling had to be done without repetitions, it is not guaranteed that the samples gathered from our buffer will be unique: .. GENERATED FROM PYTHON SOURCE LINES 162-166 .. code-block:: Python sample = buffer.sample(batch_size=30) print(sample) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorDict( fields={ action: Tensor(shape=torch.Size([30, 2]), device=cpu, dtype=torch.int64, is_shared=False), collector: TensorDict( fields={ traj_ids: Tensor(shape=torch.Size([30]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([30]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False), observation: Tensor(shape=torch.Size([30, 4]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([30]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([30, 4]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([30, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([30]), device=cpu, is_shared=False) .. GENERATED FROM PYTHON SOURCE LINES 167-189 Again, our sample looks exactly the same as the data we gathered from the collector! Next steps ---------- - You can have look at other multirpocessed collectors such as :class:`~torchrl.collectors.collectors.MultiSyncDataCollector` or :class:`~torchrl.collectors.collectors.MultiaSyncDataCollector`. - TorchRL also offers distributed collectors if you have multiple nodes to use for inference. Check them out in the :ref:`API reference `. - Check the dedicated :ref:`Replay Buffer tutorial ` to know more about the options you have when building a buffer, or the :ref:`API reference ` which covers all the features in details. Replay buffers have countless features such as multithreaded sampling, prioritized experience replay, and many more... - We left out the capacity of replay buffers to be iterated over for simplicity. Try it out for yourself: build a buffer and indicate its batch-size in the constructor, then try to iterate over it. This is equivalent to calling ``rb.sample()`` within a loop! .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 18.780 seconds) **Estimated memory usage:** 28 MB .. _sphx_glr_download_tutorials_getting-started-3.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: getting-started-3.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: getting-started-3.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_