.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/torchrl_envs.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_torchrl_envs.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_torchrl_envs.py:


TorchRL envs
============

**Author**: `Vincent Moens <https://github.com/vmoens>`_

.. _envs_tuto:

.. GENERATED FROM PYTHON SOURCE LINES 12-37

Environments play a crucial role in RL settings, often somewhat similar to
datasets in supervised and unsupervised settings. The RL community has
become quite familiar with OpenAI gym API which offers a flexible way of
building environments, initializing them and interacting with them. However,
many other libraries exist, and the way one interacts with them can be quite
different from what is expected with *gym*.

Let us start by describing how TorchRL interacts with gym, which will serve
as an introduction to other frameworks.

Gym environments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To run this part of the tutorial, you will need to have a recent version of
the gym library installed, as well as the atari suite. You can get this
installed by installing the following packages:

  .. code-block:: bash

    $ pip install gym atari-py ale-py gym[accept-rom-license] pygame

To unify all frameworks, torchrl environments are built inside the
``__init__`` method with a private method called ``_build_env`` that
will pass the arguments and keyword arguments to the root library builder.

With gym, it means that building an environment is as easy as:

.. GENERATED FROM PYTHON SOURCE LINES 38-44

.. code-block:: Python


    import torch
    from matplotlib import pyplot as plt
    from tensordict import TensorDict


.. GENERATED FROM PYTHON SOURCE LINES 69-74

.. code-block:: Python


    from torchrl.envs.libs.gym import GymEnv

    env = GymEnv("Pendulum-v1")


.. GENERATED FROM PYTHON SOURCE LINES 75-77

The list of available environment can be accessed through this command:


.. GENERATED FROM PYTHON SOURCE LINES 77-80

.. code-block:: Python


    list(GymEnv.available_envs)[:10]


.. GENERATED FROM PYTHON SOURCE LINES 81-89

Env Specs
------------------------------

Like other frameworks, TorchRL envs have attributes that indicate what
space is for the observations, action, done and reward. Because it often happens
that more than one observation is retrieved, we expect the observation spec
to be of type ``CompositeSpec``.
Reward and action do not have this restriction:

.. GENERATED FROM PYTHON SOURCE LINES 89-94

.. code-block:: Python


    print("Env observation_spec: \n", env.observation_spec)
    print("Env action_spec: \n", env.action_spec)
    print("Env reward_spec: \n", env.reward_spec)


.. GENERATED FROM PYTHON SOURCE LINES 95-99

Those spec come with a series of useful tools: one can assert whether a
sample is in the defined space. We can also use some heuristic to project
a sample in the space if it is out of space, and generate random (possibly
uniformly distributed) numbers in that space:

.. GENERATED FROM PYTHON SOURCE LINES 99-104

.. code-block:: Python


    action = torch.ones(1) * 3
    print("action is in bounds?\n", bool(env.action_spec.is_in(action)))
    print("projected action: \n", env.action_spec.project(action))


.. GENERATED FROM PYTHON SOURCE LINES 105-108

.. code-block:: Python


    print("random action: \n", env.action_spec.rand())


.. GENERATED FROM PYTHON SOURCE LINES 109-117

Out of these specs, the ``done_spec`` deserves a special attention. In TorchRL,
all environments write end-of-trajectory signals of at least two types:
``"terminated"`` (indicating that the Markov Decision Process has reached
a final state - the __episode__ is finished) and ``"done"``, indicating that
this is the last step of a __trajectory__ (but not necessarily the end of
the task). In general, a ``"done"`` entry that is ``True`` when a ``"terminal"``
is ``False`` is caused by a ``"truncated"`` signal. Gym environments account for
these three signals:

.. GENERATED FROM PYTHON SOURCE LINES 117-120

.. code-block:: Python


    print(env.done_spec)


.. GENERATED FROM PYTHON SOURCE LINES 121-136

Envs are also packed with an ``env.state_spec`` attribute of type
``CompositeSpec`` which contains all the specs that are inputs to the env
but are not the action.
For stateful
envs (e.g. gym) this will be void most of the time.
With stateless environments
(e.g. Brax) this should also include a representation of the previous state,
or any other input to the environment (including inputs at reset time).

Seeding, resetting and steps
------------------------------
The basic operations on an environment are (1) ``set_seed``, (2) ``reset``
and (3) ``step``.

Let's see how these methods work with TorchRL:

.. GENERATED FROM PYTHON SOURCE LINES 136-142

.. code-block:: Python


    torch.manual_seed(0)  # make sure that all torch code is also reproductible
    env.set_seed(0)
    reset_data = env.reset()
    print("reset data", reset_data)


.. GENERATED FROM PYTHON SOURCE LINES 143-145

We can now execute a step in the environment. Since we don't have a policy,
we can just generate a random action:

.. GENERATED FROM PYTHON SOURCE LINES 145-153

.. code-block:: Python


    policy = TensorDictModule(env.action_spec.rand, in_keys=[], out_keys=["action"])


    policy(reset_data)
    tensordict_out = env.step(reset_data)


.. GENERATED FROM PYTHON SOURCE LINES 154-155

By default, the tensordict returned by ``step`` is the same as the input...

.. GENERATED FROM PYTHON SOURCE LINES 155-158

.. code-block:: Python


    assert tensordict_out is reset_data


.. GENERATED FROM PYTHON SOURCE LINES 159-160

... but with new keys

.. GENERATED FROM PYTHON SOURCE LINES 160-163

.. code-block:: Python


    tensordict_out


.. GENERATED FROM PYTHON SOURCE LINES 164-166

What we just did (a random step using ``action_spec.rand()``) can also be
done via the simple shortcut.

.. GENERATED FROM PYTHON SOURCE LINES 166-169

.. code-block:: Python


    env.rand_step()


.. GENERATED FROM PYTHON SOURCE LINES 170-176

The new key ``("next", "observation")`` (as all keys under the ``"next"``
tensordict) have a special role in TorchRL: they indicate that they come
after the key with the same name but without the prefix.

We provide a function ``step_mdp`` that executes a step in the tensordict:
it returns a new tensordict updated such that *t < -t'*:

.. GENERATED FROM PYTHON SOURCE LINES 176-190

.. code-block:: Python


    from torchrl.envs.utils import step_mdp

    tensordict_out.set("some other key", torch.randn(1))
    tensordict_tprime = step_mdp(tensordict_out)

    print(tensordict_tprime)
    print(
        (
            tensordict_tprime.get("observation")
            == tensordict_out.get(("next", "observation"))
        ).all()
    )


.. GENERATED FROM PYTHON SOURCE LINES 191-196

We can observe that ``step_mdp`` has removed all the time-dependent
key-value pairs, but not ``"some other key"``. Also, the new
observation matches the previous one.

Finally, note that the ``env.reset`` method also accepts a tensordict to update:

.. GENERATED FROM PYTHON SOURCE LINES 196-201

.. code-block:: Python


    data = TensorDict()
    assert env.reset(data) is data
    data


.. GENERATED FROM PYTHON SOURCE LINES 202-206

Rollouts
------------------------------
The generic environment class provided by TorchRL allows you to run rollouts
easily for a given number of steps:

.. GENERATED FROM PYTHON SOURCE LINES 206-210

.. code-block:: Python


    tensordict_rollout = env.rollout(max_steps=20, policy=policy)
    print(tensordict_rollout)


.. GENERATED FROM PYTHON SOURCE LINES 211-214

The resulting tensordict has a ``batch_size`` of ``[20]``, which is the
length of the trajectory. We can check that the observation match their
next value:

.. GENERATED FROM PYTHON SOURCE LINES 214-221

.. code-block:: Python


    (
        tensordict_rollout.get("observation")[1:]
        == tensordict_rollout.get(("next", "observation"))[:-1]
    ).all()


.. GENERATED FROM PYTHON SOURCE LINES 222-232

``frame_skip``
------------------------------
In some instances, it is useful to use a ``frame_skip`` argument to use the
same action for several consecutive frames.

The resulting tensordict will contain only the last frame observed in the
sequence, but the rewards will be summed over the number of frames.

If the environment reaches a done state during this process, it'll stop
and return the result of the truncated chain.

.. GENERATED FROM PYTHON SOURCE LINES 232-236

.. code-block:: Python


    env = GymEnv("Pendulum-v1", frame_skip=4)
    env.reset()


.. GENERATED FROM PYTHON SOURCE LINES 237-242

Rendering
------------------------------
Rendering plays an important role in many RL settings, and this is why the
generic environment class from torchrl provides a ``from_pixels`` keyword
argument that allows the user to quickly ask for image-based environments:

.. GENERATED FROM PYTHON SOURCE LINES 242-245

.. code-block:: Python


    env = GymEnv("Pendulum-v1", from_pixels=True)


.. GENERATED FROM PYTHON SOURCE LINES 246-250

.. code-block:: Python


    data = env.reset()
    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 251-254

.. code-block:: Python


    plt.imshow(data.get("pixels").numpy())


.. GENERATED FROM PYTHON SOURCE LINES 255-256

Let's have a look at what the tensordict contains:

.. GENERATED FROM PYTHON SOURCE LINES 256-259

.. code-block:: Python


    data


.. GENERATED FROM PYTHON SOURCE LINES 260-268

We still have a ``"state"`` that describes what ``"observation"`` used to
describe in the previous case (the naming difference comes from the fact that
gym now returns a dictionary and TorchRL gets the names from the dictionary
if it exists, otherwise it names the step output ``"observation"``: in a
few words, this is due to inconsistencies in the object type returned by
gym environment step method).

One can also discard this supplementary output by asking for the pixels only:

.. GENERATED FROM PYTHON SOURCE LINES 268-273

.. code-block:: Python


    env = GymEnv("Pendulum-v1", from_pixels=True, pixels_only=True)
    env.reset()
    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 274-275

Some environments only come in image-based format

.. GENERATED FROM PYTHON SOURCE LINES 275-281

.. code-block:: Python


    env = GymEnv("ALE/Pong-v5")
    print("from pixels: ", env.from_pixels)
    print("data: ", env.reset())
    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 282-289

DeepMind Control environments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To run this part of the tutorial, make sure you have installed dm_control:
   $ pip install dm_control
We also provide a wrapper for DM Control suite. Again, building an
environment is easy: first let's look at what environments can be accessed.
The ``available_envs`` now returns a dict of envs and possible tasks:

.. GENERATED FROM PYTHON SOURCE LINES 289-292

.. code-block:: Python


    from matplotlib import pyplot as plt


.. GENERATED FROM PYTHON SOURCE LINES 293-298

.. code-block:: Python


    from torchrl.envs.libs.dm_control import DMControlEnv

    DMControlEnv.available_envs


.. GENERATED FROM PYTHON SOURCE LINES 299-305

.. code-block:: Python


    env = DMControlEnv("acrobot", "swingup")
    data = env.reset()
    print("result of reset: ", data)
    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 306-307

Of course we can also use pixel-based environments:

.. GENERATED FROM PYTHON SOURCE LINES 307-314

.. code-block:: Python


    env = DMControlEnv("acrobot", "swingup", from_pixels=True, pixels_only=True)
    data = env.reset()
    print("result of reset: ", data)
    plt.imshow(data.get("pixels").numpy())
    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 315-336

Transforming envs
^^^^^^^^^^^^^^^^^
It is common to pre-process the output of an environment before having it
read by the policy or stored in a buffer.

In many instances, the RL community has adopted a wrapping scheme of the type
   $ env_transformed = wrapper1(wrapper2(env))
to transform environments. This has numerous advantages: it makes accessing
the environment specs obvious (the outer wrapper is the source of truth for
the external world), and it makes it easy to interact with vectorized
environment. However it also makes it hard to access inner environments:
say one wants to remove a wrapper (e.g. ``wrapper2``) from the chain,
this operation requires us to collect
   $ env0 = env.env.env

   $ env_transformed_bis = wrapper1(env0)
TorchRL takes the stance of using sequences of transforms instead, as it is
done in other pytorch domain libraries (e.g. ``torchvision``). This
approach is also similar to the way distributions are transformed in
``torch.distribution``, where a ``TransformedDistribution`` object is
built around a ``base_dist`` distribution and (a sequence of) ``transforms``.

.. GENERATED FROM PYTHON SOURCE LINES 336-347

.. code-block:: Python


    from torchrl.envs.transforms import ToTensorImage, TransformedEnv

    # ToTensorImage transforms a numpy-like image into a tensor one,
    env = DMControlEnv("acrobot", "swingup", from_pixels=True, pixels_only=True)
    print("reset before transform: ", env.reset())

    env = TransformedEnv(env, ToTensorImage())
    print("reset after transform: ", env.reset())
    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 348-349

To compose transforms, simply use the ``Compose`` class:

.. GENERATED FROM PYTHON SOURCE LINES 349-356

.. code-block:: Python


    from torchrl.envs.transforms import Compose, Resize

    env = DMControlEnv("acrobot", "swingup", from_pixels=True, pixels_only=True)
    env = TransformedEnv(env, Compose(ToTensorImage(), Resize(32, 32)))
    env.reset()


.. GENERATED FROM PYTHON SOURCE LINES 357-358

Transforms can also be added one at a time:

.. GENERATED FROM PYTHON SOURCE LINES 358-364

.. code-block:: Python


    from torchrl.envs.transforms import GrayScale

    env.append_transform(GrayScale())
    env.reset()


.. GENERATED FROM PYTHON SOURCE LINES 365-366

As expected, the metadata get updated too:

.. GENERATED FROM PYTHON SOURCE LINES 366-370

.. code-block:: Python


    print("original obs spec: ", env.base_env.observation_spec)
    print("current obs spec: ", env.observation_spec)


.. GENERATED FROM PYTHON SOURCE LINES 371-372

We can also concatenate tensors if needed:

.. GENERATED FROM PYTHON SOURCE LINES 372-384

.. code-block:: Python


    from torchrl.envs.transforms import CatTensors

    env = DMControlEnv("acrobot", "swingup")
    print("keys before concat: ", env.reset())

    env = TransformedEnv(
        env,
        CatTensors(in_keys=["orientations", "velocity"], out_key="observation"),
    )
    print("keys after concat: ", env.reset())


.. GENERATED FROM PYTHON SOURCE LINES 385-403

This feature makes it easy to mofidy the sets of transforms applied to an
environment input and output. In fact, transforms are run both before and
after a step is executed: for the pre-step pass, the ``in_keys_inv`` list of
keys will be passed to the ``_inv_apply_transform`` method. An example of
such a transform would be to transform floating-point actions (output from
a neural network) to the double dtype (requires by the wrapped environment).
After the step is executed, the ``_apply_transform`` method will be
executed on the keys indicated by the ``in_keys`` list of keys.

Another interesting feature of the environment transforms is that they
allow the user to retrieve the equivalent of ``env.env`` in the wrapped
case, or in other words the parent environment. The parent environment can
be retrieved by calling ``transform.parent``: the returned environment
will consist in a ``TransformedEnvironment`` with all the transforms up to
(but not including) the current transform. This is be used for instance in
the ``NoopResetEnv`` case, which when reset executes the following steps:
resets the parent environment before executing a certain number of steps
at random in that environment.

.. GENERATED FROM PYTHON SOURCE LINES 403-415

.. code-block:: Python


    env = DMControlEnv("acrobot", "swingup")
    env = TransformedEnv(env)
    env.append_transform(
        CatTensors(in_keys=["orientations", "velocity"], out_key="observation")
    )
    env.append_transform(GrayScale())

    print("env: \n", env)
    print("GrayScale transform parent env: \n", env.transform[1].parent)
    print("CatTensors transform parent env: \n", env.transform[0].parent)


.. GENERATED FROM PYTHON SOURCE LINES 416-430

Environment device
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transforms can work on device, which can bring a significant speedup when
operations are moderetely or highly computationally demanding. These include
``ToTensorImage``, ``Resize``, ``GrayScale`` etc.

One could legitimately ask what that implies on the wrapped environment
side. Very little for regular environments: the operations will still happen
on the device where they're supposed to happen. The environment device
attribute in torchrl indicates on which device is the incoming data supposed
to be and on which device the output data will be. Casting from and to that
device is the responsibility of the torchrl environment class. The big
advantage of storing data on GPU is (1) speedup of transforms as mentioned
above and (2) sharing data amongst workers in multiprocessing settings.

.. GENERATED FROM PYTHON SOURCE LINES 430-443

.. code-block:: Python


    from torchrl.envs.transforms import CatTensors, GrayScale, TransformedEnv

    env = DMControlEnv("acrobot", "swingup")
    env = TransformedEnv(env)
    env.append_transform(
        CatTensors(in_keys=["orientations", "velocity"], out_key="observation")
    )

    if torch.has_cuda and torch.cuda.device_count():
        env.to("cuda:0")
        env.reset()


.. GENERATED FROM PYTHON SOURCE LINES 444-451

Running environments in parallel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TorchRL provides utilities to run environment in parallel. It is expected
that the various environment read and return tensors of similar shapes and
dtypes (but one could design masking functions to make this possible in case
those tensors differ in shapes). Creating such environments is quite easy.
Let us look at the simplest case:

.. GENERATED FROM PYTHON SOURCE LINES 451-464

.. code-block:: Python


    from torchrl.envs import ParallelEnv


    def env_make():
        return GymEnv("Pendulum-v1")


    parallel_env = ParallelEnv(3, env_make)  # -> creates 3 envs in parallel
    parallel_env = ParallelEnv(
        3, [env_make, env_make, env_make]
    )  # similar to the previous command


.. GENERATED FROM PYTHON SOURCE LINES 465-474

The ``SerialEnv`` class is similar to the ``ParallelEnv`` except for the
fact that environments are run sequentially. This is mostly useful for
debugging purposes.

``ParallelEnv`` instances are created in lazy mode: the environment will
start running only when called. This allows us to move ``ParallelEnv``
objects from process to process without worrying too much about running
processes. A ``ParallelEnv`` can be started by calling ``start``, ``reset``
or simply by calling ``step`` (if ``reset`` does not need to be called first).

.. GENERATED FROM PYTHON SOURCE LINES 474-477

.. code-block:: Python


    parallel_env.reset()


.. GENERATED FROM PYTHON SOURCE LINES 478-481

One can check that the parallel environment has the right batch size.
Conventionally, the first part of the ``batch_size`` indicates the batch,
the second the time frame. Let's check that with the ``rollout`` method:

.. GENERATED FROM PYTHON SOURCE LINES 481-484

.. code-block:: Python


    parallel_env.rollout(max_steps=20)


.. GENERATED FROM PYTHON SOURCE LINES 485-492

Closing parallel environments
------------------------------
**Important**: before closing a program, it is important to close the
parallel environment. In general, even with regular environments, it is good
practice to close a function with a call to ``close``. In some instances,
TorchRL will throw an error if this is not done (and often it will be at the
end of a program, when the environment gets out of scope!)

.. GENERATED FROM PYTHON SOURCE LINES 492-495

.. code-block:: Python


    parallel_env.close()


.. GENERATED FROM PYTHON SOURCE LINES 496-506

Seeding
------------------------------
When seeding a parallel environment, the difficulty we face is that we don't
want to provide the same seed to all environments. The heuristic used by
TorchRL is that we produce a deterministic chain of seeds given the input
seed in a - so to say - Markovian way, such that it can be reconstructed
from any of its elements. All ``set_seed`` methods will return the next seed to
be used, such that one can easily keep the chain going given the last seed.
This is useful when several collectors all contain a ``ParallelEnv``
instance and we want each of the sub-sub-environments to have a different seed.

.. GENERATED FROM PYTHON SOURCE LINES 506-512

.. code-block:: Python


    out_seed = parallel_env.set_seed(10)
    print(out_seed)

    del parallel_env


.. GENERATED FROM PYTHON SOURCE LINES 513-518

Accessing environment attributes
---------------------------------
It sometimes occurs that a wrapped environment has an attribute that is of
interest. First, note that TorchRL environment wrapper constrains the toolings
to access this attribute. Here's an example:

.. GENERATED FROM PYTHON SOURCE LINES 518-521

.. code-block:: Python


    from time import sleep


.. GENERATED FROM PYTHON SOURCE LINES 522-535

.. code-block:: Python


    from uuid import uuid1


    def env_make():
        env = GymEnv("Pendulum-v1")
        env._env.foo = f"bar_{uuid1()}"
        env._env.get_something = lambda r: r + 1
        return env


    env = env_make()


.. GENERATED FROM PYTHON SOURCE LINES 536-540

.. code-block:: Python


    # Goes through env._env
    env.foo


.. GENERATED FROM PYTHON SOURCE LINES 541-551

.. code-block:: Python


    parallel_env = ParallelEnv(3, env_make)  # -> creates 3 envs in parallel

    # env has not been started --> error:
    try:
        parallel_env.foo
    except RuntimeError:
        print("Aargh what did I do!")
        sleep(2)  # make sure we don't get ahead of ourselves


.. GENERATED FROM PYTHON SOURCE LINES 552-558

.. code-block:: Python


    if parallel_env.is_closed:
        parallel_env.start()
    foo_list = parallel_env.foo
    foo_list  # needs to be instantiated, for instance using list


.. GENERATED FROM PYTHON SOURCE LINES 559-562

.. code-block:: Python


    list(foo_list)


.. GENERATED FROM PYTHON SOURCE LINES 563-564

Similarly, methods can also be accessed:

.. GENERATED FROM PYTHON SOURCE LINES 564-568

.. code-block:: Python


    something = parallel_env.get_something(0)
    print(something)


.. GENERATED FROM PYTHON SOURCE LINES 569-573

.. code-block:: Python


    parallel_env.close()
    del parallel_env


.. GENERATED FROM PYTHON SOURCE LINES 574-578

kwargs for parallel environments
---------------------------------
One may want to provide kwargs to the various environments. This can achieved
either at construction time or afterwards:

.. GENERATED FROM PYTHON SOURCE LINES 580-609

.. code-block:: Python


    from torchrl.envs import ParallelEnv


    def env_make(env_name):
        env = TransformedEnv(
            GymEnv(env_name, from_pixels=True, pixels_only=True),
            Compose(ToTensorImage(), Resize(64, 64)),
        )
        return env


    parallel_env = ParallelEnv(
        2,
        [env_make, env_make],
        create_env_kwargs=[{"env_name": "ALE/AirRaid-v5"}, {"env_name": "ALE/Pong-v5"}],
    )
    data = parallel_env.reset()

    plt.figure()
    plt.subplot(121)
    plt.imshow(data[0].get("pixels").permute(1, 2, 0).numpy())
    plt.subplot(122)
    plt.imshow(data[1].get("pixels").permute(1, 2, 0).numpy())
    parallel_env.close()
    del parallel_env

    from matplotlib import pyplot as plt


.. GENERATED FROM PYTHON SOURCE LINES 610-617

Transforming parallel environments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are two equivalent ways of transforming parallel environments: in each
process separately, or on the main process. It is even possible to do both.
One can therefore think carefully about the transform design to leverage the
device capabilities (e.g. transforms on cuda devices) and vectorizing
operations on the main process if possible.

.. GENERATED FROM PYTHON SOURCE LINES 617-653

.. code-block:: Python


    from torchrl.envs import (
        Compose,
        GrayScale,
        ParallelEnv,
        Resize,
        ToTensorImage,
        TransformedEnv,
    )


    def env_make(env_name):
        env = TransformedEnv(
            GymEnv(env_name, from_pixels=True, pixels_only=True),
            Compose(ToTensorImage(), Resize(64, 64)),
        )  # transforms on remote processes
        return env


    parallel_env = ParallelEnv(
        2,
        [env_make, env_make],
        create_env_kwargs=[{"env_name": "ALE/AirRaid-v5"}, {"env_name": "ALE/Pong-v5"}],
    )
    parallel_env = TransformedEnv(parallel_env, GrayScale())  # transforms on main process
    data = parallel_env.reset()

    print("grayscale data: ", data)
    plt.figure()
    plt.subplot(121)
    plt.imshow(data[0].get("pixels").permute(1, 2, 0).numpy())
    plt.subplot(122)
    plt.imshow(data[1].get("pixels").permute(1, 2, 0).numpy())
    parallel_env.close()
    del parallel_env


.. GENERATED FROM PYTHON SOURCE LINES 654-674

VecNorm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In RL, we commonly face the problem of normalizing data before inputting
them into a model. Sometimes, we can get a good approximation of the
normalizing statistics from data gathered in the environment with, say, a
random policy (or demonstrations). It might, however, be advisable to
normalize the data "on-the-fly", updating the normalizing constants
progressively to what has been observed so far. This is particularly
useful when we expect the normalizing statistics to change following
changes in performance in the task, or when the environment is evolving
due to external factors.

**Caution**: this feature should be used with caution with off-policy
learning, as old data will be "deprecated" due to its normalization with
previously valid normalizing statistics. In on-policy settings too, this
feature makes learning non-steady and may have unexpected effects. One
would therefore advice users to rely on this feature with caution and compare
it with data normalizing given a fixed version of the normalizing constants.

In regular setting, using VecNorm is quite easy:

.. GENERATED FROM PYTHON SOURCE LINES 674-684

.. code-block:: Python


    from torchrl.envs.libs.gym import GymEnv
    from torchrl.envs.transforms import TransformedEnv, VecNorm

    env = TransformedEnv(GymEnv("Pendulum-v1"), VecNorm())
    data = env.rollout(max_steps=100)

    print("mean: :", data.get("observation").mean(0))  # Approx 0
    print("std: :", data.get("observation").std(0))  # Approx 1


.. GENERATED FROM PYTHON SOURCE LINES 685-691

In **parallel envs** things are slightly more complicated, as we need to
share the running statistics amongst the processes. We created a class
``EnvCreator`` that is responsible for looking at an environment creation
method, retrieving tensordicts to share amongst processes in the environment
class, and pointing each process to the right common, shared data
once created:

.. GENERATED FROM PYTHON SOURCE LINES 691-710

.. code-block:: Python


    from torchrl.envs import EnvCreator, ParallelEnv
    from torchrl.envs.libs.gym import GymEnv
    from torchrl.envs.transforms import TransformedEnv, VecNorm

    make_env = EnvCreator(lambda: TransformedEnv(GymEnv("CartPole-v1"), VecNorm(decay=1.0)))
    env = ParallelEnv(3, make_env)
    print("env state dict:")
    sd = TensorDict(make_env.state_dict())
    print(sd)
    # Zeroes all tensors
    sd *= 0

    data = env.rollout(max_steps=5)

    print("data: ", data)
    print("mean: :", data.get("observation").view(-1, 3).mean(0))  # Approx 0
    print("std: :", data.get("observation").view(-1, 3).std(0))  # Approx 1


.. GENERATED FROM PYTHON SOURCE LINES 711-716

The count is slightly higher than the number of steps (since we
did not use any decay). The difference between the two is due to the fact
that ``ParallelEnv`` creates a dummy environment to initialize the shared
``TensorDict`` that is used to collect data from the dispatched environments.
This small difference will usually be absored throughout training.

.. GENERATED FROM PYTHON SOURCE LINES 716-724

.. code-block:: Python


    print(
        "update counts: ",
        make_env.state_dict()["_extra_state"]["observation_count"],
    )

    env.close()
    del env


.. _sphx_glr_download_tutorials_torchrl_envs.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: torchrl_envs.ipynb <torchrl_envs.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: torchrl_envs.py <torchrl_envs.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: torchrl_envs.zip <torchrl_envs.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_