.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/getting-started-0.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_getting-started-0.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_getting-started-0.py:


Get started with Environments, TED and transforms
=================================================

**Author**: `Vincent Moens <https://github.com/vmoens>`_

.. _gs_env_ted:

.. note:: To run this tutorial in a notebook, add an installation cell
  at the beginning containing:

    .. code-block::

        !pip install tensordict
        !pip install torchrl

.. GENERATED FROM PYTHON SOURCE LINES 21-63

Welcome to the getting started tutorials!

Below is the list of the topics we will be covering.

- :ref:`Environments, TED and transforms <gs_env_ted>`;
- :ref:`TorchRL's modules <gs_modules>`;
- :ref:`Losses and optimization <gs_optim>`;
- :ref:`Data collection and storage <gs_storage>`;
- :ref:`TorchRL's logging API <gs_logging>`.

If you are in a hurry, you can jump straight away to the last tutorial,
:ref:`Your own first training loop <gs_first_training>`, from where you can
backtrack every other "Getting Started" tutorial if things are not clear or
if you want to learn more about a specific topic!

Environments in RL
------------------

The standard RL (Reinforcement Learning) training loop involves a model,
also known as a policy, which is trained to accomplish a task within a
specific environment. Often, this environment is a simulator that accepts
actions as input and produces an observation along with some metadata as
output.

In this document, we will explore the environment API of TorchRL: we will
learn how to create an environment, interact with it, and understand the
data format it uses.

Creating an environment
-----------------------

In essence, TorchRL does not directly provide environments, but instead
offers wrappers for other libraries that encapsulate the simulators. The
:mod:`~torchrl.envs` module can be viewed as a provider for a generic
environment API, as well as a central hub for simulation backends like
`gym <https://arxiv.org/abs/1606.01540>`_ (:class:`~torchrl.envs.GymEnv`),
`Brax <https://arxiv.org/abs/2106.13281>`_ (:class:`~torchrl.envs.BraxEnv`)
or `DeepMind Control Suite <https://arxiv.org/abs/1801.00690>`_
(:class:`~torchrl.envs.DMControlEnv`).

Creating your environment is typically as straightforward as the underlying
backend API allows. Here's an example using gym:

.. GENERATED FROM PYTHON SOURCE LINES 63-68

.. code-block:: Python


    from torchrl.envs import GymEnv

    env = GymEnv("Pendulum-v1")


.. GENERATED FROM PYTHON SOURCE LINES 69-89

Running an environment
----------------------

Environments in TorchRL have two crucial methods:
:meth:`~torchrl.envs.EnvBase.reset`, which initiates
an episode, and :meth:`~torchrl.envs.EnvBase.step`, which executes an
action selected by the actor.
In TorchRL, environment methods read and write
:class:`~tensordict.TensorDict` instances.
Essentially, :class:`~tensordict.TensorDict` is a generic key-based data
carrier for tensors.
The benefit of using TensorDict over plain tensors is that it enables us to
handle simple and complex data structures interchangeably. As our function
signatures are very generic, it eliminates the challenge of accommodating
different data formats. In simpler terms, after this brief tutorial,
you will be capable of operating on both simple and highly complex
environments, as their user-facing API is identical and simple!

Let's put the environment into action and see what a tensordict instance
looks like:

.. GENERATED FROM PYTHON SOURCE LINES 90-94

.. code-block:: Python


    reset = env.reset()
    print(reset)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 95-96

Now let's take a random action in the action space. First, sample the action:

.. GENERATED FROM PYTHON SOURCE LINES 96-99

.. code-block:: Python

    reset_with_action = env.rand_action(reset)
    print(reset_with_action)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 100-105

This tensordict has the same structure as the one obtained from
:meth:`~torchrl.envs.EnvBase` with an additional ``"action"`` entry.
You can access the action easily, like you would do with a regular
dictionary:


.. GENERATED FROM PYTHON SOURCE LINES 105-108

.. code-block:: Python


    print(reset_with_action["action"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    tensor([1.5947])


.. GENERATED FROM PYTHON SOURCE LINES 109-113

We now need to pass this action tp the environment.
We'll be passing the entire tensordict to the ``step`` method, since there
might be more than one tensor to be read in more advanced cases like
Multi-Agent RL or stateless environments:

.. GENERATED FROM PYTHON SOURCE LINES 113-117

.. code-block:: Python


    stepped_data = env.step(reset_with_action)
    print(stepped_data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 118-134

Again, this new tensordict is identical to the previous one except for the
fact that it has a ``"next"`` entry (itself a tensordict!) containing the
observation, reward and done state resulting from
our action.

We call this format TED, for
:ref:`TorchRL Episode Data format <TED-format>`. It is
the ubiquitous way of representing data in the library, both dynamically like
here, or statically with offline datasets.

The last bit of information you need to run a rollout in the environment is
how to bring that ``"next"`` entry at the root to perform the next step.
TorchRL provides a dedicated :func:`~torchrl.envs.utils.step_mdp` function
that does just that: it filters out the information you won't need and
delivers a data structure corresponding to your observation after a step in
the Markov Decision Process, or MDP.

.. GENERATED FROM PYTHON SOURCE LINES 134-140

.. code-block:: Python


    from torchrl.envs import step_mdp

    data = step_mdp(stepped_data)
    print(data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 141-151

Environment rollouts
--------------------

.. _gs_env_ted_rollout:

Writing down those three steps (computing an action, making a step,
moving in the MDP) can be a bit tedious and repetitive. Fortunately,
TorchRL provides a nice :meth:`~torchrl.envs.EnvBase.rollout` function that
allows you to run them in a closed loop at will:


.. GENERATED FROM PYTHON SOURCE LINES 151-155

.. code-block:: Python


    rollout = env.rollout(max_steps=10)
    print(rollout)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([10]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([10]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 156-161

This data looks pretty much like the ``stepped_data`` above with the
exception of its batch-size, which now equates the number of steps we
provided through the ``max_steps`` argument. The magic of tensordict
doesn't end there: if you're interested in a single transition of this
environment, you can index the tensordict like you would index a tensor:

.. GENERATED FROM PYTHON SOURCE LINES 161-165

.. code-block:: Python


    transition = rollout[3]
    print(transition)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 166-200

:class:`~tensordict.TensorDict` will automatically check if the index you
provided is a key (in which case we index along the key-dimension) or a
spatial index like here.

Executed as such (without a policy), the ``rollout`` method may seem rather
useless: it just runs random actions. If a policy is available, it can
be passed to the method and used to collect data.

Nevertheless, it can useful to run a naive, policyless rollout at first to
check what is to be expected from an environment at a glance.

To appreciate the versatility of TorchRL's API, consider the fact that the
rollout method is universally applicable. It functions across **all** use
cases, whether you're working with a single environment like this one,
multiple copies across various processes, a multi-agent environment, or even
a stateless version of it!


Transforming an environment
---------------------------

Most of the time, you'll want to modify the output of the environment to
better suit your requirements. For example, you might want to monitor the
number of steps executed since the last reset, resize images, or stack
consecutive observations together.

In this section, we'll examine a simple transform, the
:class:`~torchrl.envs.transforms.StepCounter` transform.
The complete list of transforms can be found
:ref:`here <transforms>`.

The transform is integrated with the environment through a
:class:`~torchrl.envs.transforms.TransformedEnv`:


.. GENERATED FROM PYTHON SOURCE LINES 200-207

.. code-block:: Python


    from torchrl.envs import StepCounter, TransformedEnv

    transformed_env = TransformedEnv(env, StepCounter(max_steps=10))
    rollout = transformed_env.rollout(max_steps=100)
    print(rollout)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorDict(
        fields={
            action: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    observation: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    step_count: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.int64, is_shared=False),
                    terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([10]),
                device=None,
                is_shared=False),
            observation: Tensor(shape=torch.Size([10, 3]), device=cpu, dtype=torch.float32, is_shared=False),
            step_count: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([10]),
        device=None,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 208-215

As you can see, our environment now has one more entry, ``"step_count"`` that
tracks the number of steps since the last reset.
Given that we passed the optional
argument ``max_steps=10`` to the transform constructor, we also truncated the
trajectory after 10 steps (not completing a full rollout of 100 steps like
we asked with the ``rollout`` call). We can see that the trajectory was
truncated by looking at the truncated entry:

.. GENERATED FROM PYTHON SOURCE LINES 215-218

.. code-block:: Python


    print(rollout["next", "truncated"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    tensor([[False],
            [False],
            [False],
            [False],
            [False],
            [False],
            [False],
            [False],
            [False],
            [ True]])


.. GENERATED FROM PYTHON SOURCE LINES 219-252

This is all for this short introduction to TorchRL's environment API!

Next steps
----------

To explore further what TorchRL's environments can do, go and check:

- The :meth:`~torchrl.envs.EnvBase.step_and_maybe_reset` method that packs
  together :meth:`~torchrl.envs.EnvBase.step`,
  :func:`~torchrl.envs.utils.step_mdp` and
  :meth:`~torchrl.envs.EnvBase.reset`.
- Some environments like :class:`~torchrl.envs.GymEnv` support rendering
  through the ``from_pixels`` argument. Check the class docstrings to know
  more!
- The batched environments, in particular :class:`~torchrl.envs.ParallelEnv`
  which allows you to run multiple copies of one same (or different!)
  environments on multiple processes.
- Design your own environment with the
  :ref:`Pendulum tutorial <pendulum_tuto>` and learn about specs and
  stateless environments.
- See the more in-depth tutorial about environments
  :ref:`in the dedicated tutorial <envs_tuto>`;
- Check the
  :ref:`multi-agent  environment API <MARL-environment-API>`
  if you're interested in MARL;
- TorchRL has many tools to interact with the Gym API such as
  a way to register TorchRL envs in the Gym register through
  :meth:`~torchrl.envs.EnvBase.register_gym`, an API to read
  the info dictionaries through
  :meth:`~torchrl.envs.EnvBase.set_info_dict_reader` or a way
  to control the gym backend thanks to
  :func:`~torchrl.envs.set_gym_backend`.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 46.408 seconds)

**Estimated memory usage:**  320 MB


.. _sphx_glr_download_tutorials_getting-started-0.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: getting-started-0.ipynb <getting-started-0.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: getting-started-0.py <getting-started-0.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: getting-started-0.zip <getting-started-0.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_