.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/pretrained_models.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_pretrained_models.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_pretrained_models.py:


Using pretrained models
=======================
This tutorial explains how to use pretrained models in TorchRL.

.. GENERATED FROM PYTHON SOURCE LINES 7-15

At the end of this tutorial, you will be capable of using pretrained models
for efficient image representation, and fine-tune them.

TorchRL provides pretrained models that are to be used either as transforms or as
components of the policy. As the sematic is the same, they can be used interchangeably
in one or the other context. In this tutorial, we will be using R3M (https://arxiv.org/abs/2203.12601),
but other models (e.g. VIP) will work equally well.


.. GENERATED FROM PYTHON SOURCE LINES 15-31

.. code-block:: Python


    import torch.cuda
    from tensordict.nn import TensorDictSequential
    from torch import nn
    from torchrl.envs import R3MTransform, TransformedEnv
    from torchrl.envs.libs.gym import GymEnv
    from torchrl.modules import Actor

    is_fork = multiprocessing.get_start_method() == "fork"
    device = (
        torch.device(0)
        if torch.cuda.is_available() and not is_fork
        else torch.device("cpu")
    )


.. GENERATED FROM PYTHON SOURCE LINES 52-56

Let us first create an environment. For the sake of simplicity, we will be using
a common gym environment. In practice, this will work in more challenging, embodied
AI contexts (e.g. have a look at our Habitat wrappers).


.. GENERATED FROM PYTHON SOURCE LINES 56-58

.. code-block:: Python

    base_env = GymEnv("Ant-v4", from_pixels=True, device=device)


.. GENERATED FROM PYTHON SOURCE LINES 59-66

Let us fetch our pretrained model. We ask for the pretrained version of the model through the
download=True flag. By default this is turned off.
Next, we will append our transform to the environment. In practice, what will happen is that
each batch of data collected will go through the transform and be mapped on a "r3m_vec" entry
in the output tensordict. Our policy, consisting of a single layer MLP, will then read this vector and compute
the corresponding action.


.. GENERATED FROM PYTHON SOURCE LINES 66-79

.. code-block:: Python

    r3m = R3MTransform(
        "resnet50",
        in_keys=["pixels"],
        download=True,
    )
    env_transformed = TransformedEnv(base_env, r3m)
    net = nn.Sequential(
        nn.LazyLinear(128, device=device),
        nn.Tanh(),
        nn.Linear(128, base_env.action_spec.shape[-1], device=device),
    )
    policy = Actor(net, in_keys=["r3m_vec"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Downloading: "https://pytorch.s3.amazonaws.com/models/rl/r3m/r3m_50.pt" to /root/.cache/torch/hub/checkpoints/r3m_50.pt
      0%|          | 0.00/374M [00:00<?, ?B/s]      4%|▍         | 14.9M/374M [00:00<00:02, 139MB/s]      8%|▊         | 28.2M/374M [00:00<00:03, 110MB/s]     10%|█         | 39.1M/374M [00:00<00:05, 62.8MB/s]     13%|█▎        | 49.2M/374M [00:00<00:06, 53.1MB/s]     17%|█▋        | 64.0M/374M [00:01<00:05, 57.3MB/s]     19%|█▊        | 70.1M/374M [00:01<00:05, 58.5MB/s]     22%|██▏       | 82.0M/374M [00:01<00:04, 64.2MB/s]     26%|██▌       | 97.8M/374M [00:01<00:04, 66.2MB/s]     28%|██▊       | 104M/374M [00:01<00:04, 62.9MB/s]      31%|███       | 115M/374M [00:01<00:03, 73.4MB/s]     35%|███▍      | 130M/374M [00:01<00:02, 90.4MB/s]     37%|███▋      | 140M/374M [00:02<00:03, 81.2MB/s]     40%|███▉      | 148M/374M [00:02<00:02, 82.4MB/s]     44%|████▎     | 163M/374M [00:02<00:02, 99.8MB/s]     46%|████▋     | 173M/374M [00:02<00:02, 93.8MB/s]     49%|████▉     | 183M/374M [00:02<00:03, 66.3MB/s]     52%|█████▏    | 195M/374M [00:02<00:02, 66.7MB/s]     54%|█████▍    | 202M/374M [00:03<00:03, 56.0MB/s]     57%|█████▋    | 212M/374M [00:03<00:03, 55.2MB/s]     58%|█████▊    | 218M/374M [00:03<00:02, 55.6MB/s]     61%|██████▏   | 229M/374M [00:03<00:02, 53.2MB/s]     66%|██████▌   | 246M/374M [00:03<00:02, 65.2MB/s]     70%|██████▉   | 262M/374M [00:03<00:01, 77.8MB/s]     72%|███████▏  | 269M/374M [00:04<00:01, 67.7MB/s]     74%|███████▍  | 277M/374M [00:04<00:01, 65.5MB/s]     76%|███████▌  | 284M/374M [00:04<00:01, 64.1MB/s]     78%|███████▊  | 293M/374M [00:04<00:01, 61.6MB/s]     80%|████████  | 299M/374M [00:04<00:01, 60.8MB/s]     83%|████████▎ | 310M/374M [00:04<00:01, 64.2MB/s]     84%|████████▍ | 316M/374M [00:04<00:01, 55.8MB/s]     88%|████████▊ | 328M/374M [00:05<00:00, 59.0MB/s]     92%|█████████▏| 342M/374M [00:05<00:00, 69.8MB/s]     94%|█████████▎| 350M/374M [00:05<00:00, 72.2MB/s]     96%|█████████▋| 360M/374M [00:05<00:00, 71.1MB/s]    100%|█████████▉| 373M/374M [00:05<00:00, 84.5MB/s]    100%|██████████| 374M/374M [00:05<00:00, 68.9MB/s]


.. GENERATED FROM PYTHON SOURCE LINES 80-82

Let's check the number of parameters of the policy:


.. GENERATED FROM PYTHON SOURCE LINES 82-84

.. code-block:: Python

    print("number of params:", len(list(policy.parameters())))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    number of params: 4


.. GENERATED FROM PYTHON SOURCE LINES 85-87

We collect a rollout of 32 steps and print its output:


.. GENERATED FROM PYTHON SOURCE LINES 87-90

.. code-block:: Python

    rollout = env_transformed.rollout(32, policy)
    print("rollout with transform:", rollout)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    rollout with transform: TensorDict(
        fields={
            action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
                    reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([32]),
                device=cpu,
                is_shared=False),
            r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([32]),
        device=cpu,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 91-95

For fine tuning, we integrate the transform in the policy after making the parameters
trainable. In practice, it may be wiser to restrict this to a subset of the parameters (say the last layer
of the MLP).


.. GENERATED FROM PYTHON SOURCE LINES 95-99

.. code-block:: Python

    r3m.train()
    policy = TensorDictSequential(r3m, policy)
    print("number of params after r3m is integrated:", len(list(policy.parameters())))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    number of params after r3m is integrated: 163


.. GENERATED FROM PYTHON SOURCE LINES 100-104

Again, we collect a rollout with R3M. The structure of the output has changed slightly, as now
the environment returns pixels (and not an embedding). The embedding "r3m_vec" is an intermediate
result of our policy.


.. GENERATED FROM PYTHON SOURCE LINES 104-107

.. code-block:: Python

    rollout = base_env.rollout(32, policy)
    print("rollout, fine tuning:", rollout)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    rollout, fine tuning: TensorDict(
        fields={
            action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    pixels: Tensor(shape=torch.Size([32, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                    reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([32]),
                device=cpu,
                is_shared=False),
            r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([32]),
        device=cpu,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 108-115

The easiness with which we have swapped the transform from the env to the policy
is due to the fact that both behave like TensorDictModule: they have a set of `"in_keys"` and
`"out_keys"` that make it easy to read and write output in different context.

To conclude this tutorial, let's have a look at how we could use R3M to read
images stored in a replay buffer (e.g. in an offline RL context). First, let's build our dataset:


.. GENERATED FROM PYTHON SOURCE LINES 115-120

.. code-block:: Python

    from torchrl.data import LazyMemmapStorage, ReplayBuffer

    storage = LazyMemmapStorage(1000)
    rb = ReplayBuffer(storage=storage, transform=r3m)


.. GENERATED FROM PYTHON SOURCE LINES 121-124

We can now collect the data (random rollouts for our purpose) and fill the replay
buffer with it:


.. GENERATED FROM PYTHON SOURCE LINES 124-130

.. code-block:: Python

    total = 0
    while total < 1000:
        tensordict = base_env.rollout(1000)
        rb.extend(tensordict)
        total += tensordict.numel()


.. GENERATED FROM PYTHON SOURCE LINES 131-133

Let's check what our replay buffer storage looks like. It should not contain the "r3m_vec" entry
since we haven't used it yet:

.. GENERATED FROM PYTHON SOURCE LINES 133-135

.. code-block:: Python

    print("stored data:", storage._storage)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    stored data: TensorDict(
        fields={
            action: MemoryMappedTensor(shape=torch.Size([1000, 8]), device=cpu, dtype=torch.float32, is_shared=False),
            done: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    pixels: MemoryMappedTensor(shape=torch.Size([1000, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                    reward: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([1000]),
                device=cpu,
                is_shared=False),
            pixels: MemoryMappedTensor(shape=torch.Size([1000, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
            terminated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([1000]),
        device=cpu,
        is_shared=False)


.. GENERATED FROM PYTHON SOURCE LINES 136-139

When sampling, the data will go through the R3M transform, giving us the processed data that we wanted.
In this way, we can train an algorithm offline on a dataset made of images:


.. GENERATED FROM PYTHON SOURCE LINES 139-142

.. code-block:: Python

    batch = rb.sample(32)
    print("data after sampling:", batch)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    data after sampling: TensorDict(
        fields={
            action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
            done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            next: TensorDict(
                fields={
                    done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    pixels: Tensor(shape=torch.Size([32, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                    reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                    terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                    truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
                batch_size=torch.Size([32]),
                device=cpu,
                is_shared=False),
            r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
            terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
            truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
        batch_size=torch.Size([32]),
        device=cpu,
        is_shared=False)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 52.259 seconds)

**Estimated memory usage:**  2602 MB


.. _sphx_glr_download_tutorials_pretrained_models.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: pretrained_models.ipynb <pretrained_models.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: pretrained_models.py <pretrained_models.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: pretrained_models.zip <pretrained_models.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_