.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/llm_browser.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_llm_browser.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_llm_browser.py:


TorchRL LLM: Building Tool-Enabled Environments
===============================================

**Author**: `Vincent Moens <https://github.com/vmoens>`_

.. _llm_tools:

This tutorial demonstrates how to build and compose LLM environments with tool capabilities
in TorchRL. We'll show how to create a complete environment that can execute tools,
format responses, and handle interactions between the LLM and external tools.

The tutorial uses web browsing as a concrete example, but the concepts apply to any
tool integration in TorchRL's LLM framework.

Main takeaways:

- Understanding TorchRL's LLM environment composition
- Creating and appending tool transforms
- Formatting tool responses and LLM interactions
- Handling tool execution and state management

Prerequisites: Basic familiarity with TorchRL's environment concepts.

.. GENERATED FROM PYTHON SOURCE LINES 27-48

Installation
------------

First, install TorchRL with LLM support. If you're running this in a Jupyter
notebook, you can install the packages using:

.. code-block:: bash

    %pip install "torchrl[llm]"    # Install TorchRL with all LLM dependencies

The `torchrl[llm]` package includes all necessary dependencies for LLM functionality,
including transformers, vllm, and playwright for browser automation.

After installation, you'll need to set up the browser automation components:

.. code-block:: bash

    !playwright install            # Install browser binaries

Note: The `!` and `%pip` prefixes are specific to Jupyter notebooks. In a regular
terminal, use these commands without the prefixes.

.. GENERATED FROM PYTHON SOURCE LINES 50-62

Environment Setup
-----------------

TorchRL's LLM interface is built around composable environments and transforms.
The key components are:

1. A base environment (ChatEnv)
2. Tool execution transforms
3. Data loading transforms
4. Reward computation transforms

Let's import the necessary components and set up our environment.

.. GENERATED FROM PYTHON SOURCE LINES 62-79

.. code-block:: Python


    from __future__ import annotations

    import warnings

    import torch

    from tensordict import set_list_to_stack, TensorDict
    from torchrl import torchrl_logger
    from torchrl.data import CompositeSpec, Unbounded
    from torchrl.envs import Transform
    from torchrl.envs.llm import ChatEnv
    from torchrl.envs.llm.transforms.browser import BrowserTransform
    from transformers import AutoTokenizer

    warnings.filterwarnings("ignore")


.. GENERATED FROM PYTHON SOURCE LINES 80-86

Step 1: Basic Environment Configuration
---------------------------------------

We'll create a ChatEnv and configure it with browser automation capabilities.
First, we enable list-to-stack conversion for TensorDict, which is required
for proper batch handling in LLM environments.

.. GENERATED FROM PYTHON SOURCE LINES 86-90

.. code-block:: Python


    # Enable list-to-stack conversion for TensorDict
    set_list_to_stack(True).set()


.. GENERATED FROM PYTHON SOURCE LINES 91-93

Now we'll create the tokenizer and base environment. The environment requires
a batch size, even if we're only running a single instance.

.. GENERATED FROM PYTHON SOURCE LINES 93-105

.. code-block:: Python


    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
    env = ChatEnv(
        batch_size=(1,),
        tokenizer=tokenizer,
        apply_template=True,
        system_prompt=(
            "You are a helpful assistant that can use tools to accomplish tasks. "
            "Tools will be executed and their responses will be added to our conversation."
        ),
    )


.. GENERATED FROM PYTHON SOURCE LINES 106-108

Next, we'll add the browser transform with safety configurations. This transform
enables web browsing capabilities with domain restrictions for security.

.. GENERATED FROM PYTHON SOURCE LINES 108-115

.. code-block:: Python


    browser_transform = BrowserTransform(
        allowed_domains=["google.com", "github.com"],
        headless=False,  # Set to False to see the browser actions
    )
    env = env.append_transform(browser_transform)


.. GENERATED FROM PYTHON SOURCE LINES 116-121

We can also design a transform to assign rewards to the environment.
For example, we can parse the result of the browser transform to assign a reward
whenever specific goals are achieved. Very simply, in this example, we will assign
a reward of 2 if the LLM finds the answer to the question (Paris), a reward of 1 if it
reaches the desired website, and a reward of 0 otherwise.

.. GENERATED FROM PYTHON SOURCE LINES 121-188

.. code-block:: Python


    class RewardTransform(Transform):
        """A transform that assigns rewards based on the LLM's responses.

        This transform parses the browser responses in the environment's history and assigns
        rewards based on specific achievements:

        - Finding the correct answer (Paris): reward = 2.0
        - Successfully reaching Google: reward = 1.0
        - Otherwise: reward = 0.0

        """

        def _call(self, tensordict: TensorDict) -> TensorDict:
            """Process the tensordict and assign rewards based on the LLM's response.

            Args:
                tensordict (TensorDict): The tensordict containing the environment state.
                    Must have a "history" key containing the conversation history.

            Returns:
                TensorDict: The tensordict with an added "reward" key containing the
                    computed reward with shape (B, 1) where B is the batch size.
            """
            # ChatEnv has created a history item. We just pick up the last item,
            # and check if `"Paris"` is in the response.
            # We use index 0 because we are in a single-instance environment.
            history = tensordict[0]["history"]
            last_item = history[-1]
            if "Paris" in last_item.content:
                torchrl_logger.info("Found the answer to the question: Paris")
                # Recall that rewards have a trailing singleton dimension.
                tensordict["reward"] = torch.full((1, 1), 2.0)
            # Check if we successfully reached the website
            elif (
                "google.com" in last_item.content
                and "executed successfully" in last_item.content
            ):
                torchrl_logger.info("Reached the website google.com")
                tensordict["reward"] = torch.full((1, 1), 1.0)
            else:
                tensordict["reward"] = torch.full((1, 1), 0.0)
            return tensordict

        def transform_reward_spec(self, reward_spec: CompositeSpec) -> CompositeSpec:
            """Transform the reward spec to include our custom reward.

            This method is required to override the reward spec since the environment
            is initially reward-agnostic.

            Args:
                reward_spec (CompositeSpec): The original reward spec from the environment.

            Returns:
                CompositeSpec: The transformed reward spec with our custom reward definition.
                    The reward will have shape (B, 1) where B is the batch size.
            """
            reward_spec["reward"] = Unbounded(
                shape=reward_spec.shape + (1,), dtype=torch.float32
            )
            return reward_spec


    # We append the reward transform to the environment.
    env = env.append_transform(RewardTransform())


.. GENERATED FROM PYTHON SOURCE LINES 189-194

Step 2: Tool Execution Helper
-----------------------------

To make our interaction with tools more organized, we'll create a helper function
that executes tool actions and displays the results.

.. GENERATED FROM PYTHON SOURCE LINES 194-217

.. code-block:: Python


    def execute_tool_action(
        env: ChatEnv,
        current_state: TensorDict,
        action: str,
        verbose: bool = True,
    ) -> tuple[TensorDict, TensorDict]:
        """Execute a tool action and show the formatted interaction."""
        s = current_state.set("text_response", [action])
        s, s_ = env.step_and_maybe_reset(s)

        if verbose:
            print("\nLLM Action:")
            print("-----------")
            print(action)
            print("\nEnvironment Response:")
            print("--------------------")
            torchrl_logger.info(s_["history"].apply_chat_template(tokenizer=env.tokenizer))

        return s, s_


.. GENERATED FROM PYTHON SOURCE LINES 218-225

Step 3: Starting the Interaction
--------------------------------

Let's begin by initializing the environment with a question and navigating
to a search engine. Note that the tensordict used as input to the environment
must share the same batch size as the environment. The text query is put in a list
of length 1, such that it is compatible with the environment's batch size.

.. GENERATED FROM PYTHON SOURCE LINES 225-233

.. code-block:: Python


    reset = env.reset(
        TensorDict(
            text=["What is the capital of France?"],
            batch_size=(1,),
        )
    )


.. GENERATED FROM PYTHON SOURCE LINES 234-238

Now we'll navigate to Google using the browser transform. The transform
expects actions in a specific JSON format wrapped in tool tags.
In practice, this action should be the output of our LLM which
will write the response string in the `"text_response"` key.

.. GENERATED FROM PYTHON SOURCE LINES 238-253

.. code-block:: Python


    s, s_ = execute_tool_action(
        env,
        reset,
        """
        Let me search for that:
        <tool>browser
        {
            "action": "navigate",
            "url": "https://google.com"
        }
        </tool><|im_end|>
        """,
    )


.. GENERATED FROM PYTHON SOURCE LINES 254-259

Step 4: Performing the Search
-----------------------------

With the browser open, we can now type our query and execute the search.
First, we'll type the search query into Google's search box.

.. GENERATED FROM PYTHON SOURCE LINES 259-275

.. code-block:: Python


    s, s_ = execute_tool_action(
        env,
        s_,
        """
        Let me type the search query:
        <tool>browser
        {
            "action": "type",
            "selector": "[name='q']",
            "text": "What is the capital of France?"
        }
        </tool><|im_end|>
        """,
    )


.. GENERATED FROM PYTHON SOURCE LINES 276-278

Next, we'll click the search button to execute the search. Note how we
use CSS selectors to identify elements on the page.

.. GENERATED FROM PYTHON SOURCE LINES 278-293

.. code-block:: Python


    s, s_ = execute_tool_action(
        env,
        s_,
        """
        Now let me click the search button:
        <tool>browser
        {
            "action": "click",
            "selector": "[name='btnK']"
        }
        </tool><|im_end|>
        """,
    )


.. GENERATED FROM PYTHON SOURCE LINES 294-299

Step 5: Extracting Results
--------------------------

Finally, we'll extract the search results from the page. The browser transform
can extract both text content and HTML from specified elements.

.. GENERATED FROM PYTHON SOURCE LINES 299-315

.. code-block:: Python


    s, s_ = execute_tool_action(
        env,
        s_,
        """
        Let me extract the results:
        <tool>browser
        {
            "action": "extract",
            "selector": "#search",
            "extract_type": "text"
        }
        </tool><|im_end|>
        """,
    )


.. GENERATED FROM PYTHON SOURCE LINES 316-317

Let's close the environment.

.. GENERATED FROM PYTHON SOURCE LINES 317-319

.. code-block:: Python

    env.close()


.. GENERATED FROM PYTHON SOURCE LINES 320-337

Conclusion
----------

This tutorial demonstrates how to build and compose LLM environments with tool capabilities
in TorchRL. We've shown how to create a complete environment that can execute tools,
format responses, and handle interactions between the LLM and external tools.

The key concepts are:

1. Understanding TorchRL's LLM environment composition
2. Creating and appending tool transforms
3. Formatting tool responses and LLM interactions
4. Handling tool execution and state management
5. Integrating with LLM wrappers (vLLM, Transformers)

See the :ref:`ref_llms` tutorial for more information on how to build tool-enabled
environments with TorchRL.


.. _sphx_glr_download_tutorials_llm_browser.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: llm_browser.ipynb <llm_browser.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: llm_browser.py <llm_browser.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: llm_browser.zip <llm_browser.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_