.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/llm_browser.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_llm_browser.py: TorchRL LLM: Building Tool-Enabled Environments =============================================== **Author**: `Vincent Moens `_ .. _llm_tools: This tutorial demonstrates how to build and compose LLM environments with tool capabilities in TorchRL. We'll show how to create a complete environment that can execute tools, format responses, and handle interactions between the LLM and external tools. The tutorial uses web browsing as a concrete example, but the concepts apply to any tool integration in TorchRL's LLM framework. Main takeaways: - Understanding TorchRL's LLM environment composition - Creating and appending tool transforms - Formatting tool responses and LLM interactions - Handling tool execution and state management Prerequisites: Basic familiarity with TorchRL's environment concepts. .. GENERATED FROM PYTHON SOURCE LINES 27-48 Installation ------------ First, install TorchRL with LLM support. If you're running this in a Jupyter notebook, you can install the packages using: .. code-block:: bash %pip install "torchrl[llm]" # Install TorchRL with all LLM dependencies The `torchrl[llm]` package includes all necessary dependencies for LLM functionality, including transformers, vllm, and playwright for browser automation. After installation, you'll need to set up the browser automation components: .. code-block:: bash !playwright install # Install browser binaries Note: The `!` and `%pip` prefixes are specific to Jupyter notebooks. In a regular terminal, use these commands without the prefixes. .. GENERATED FROM PYTHON SOURCE LINES 50-62 Environment Setup ----------------- TorchRL's LLM interface is built around composable environments and transforms. The key components are: 1. A base environment (ChatEnv) 2. Tool execution transforms 3. Data loading transforms 4. Reward computation transforms Let's import the necessary components and set up our environment. .. GENERATED FROM PYTHON SOURCE LINES 62-79 .. code-block:: Python from __future__ import annotations import warnings import torch from tensordict import set_list_to_stack, TensorDict from torchrl import torchrl_logger from torchrl.data import CompositeSpec, Unbounded from torchrl.envs import Transform from torchrl.envs.llm import ChatEnv from torchrl.envs.llm.transforms.browser import BrowserTransform from transformers import AutoTokenizer warnings.filterwarnings("ignore") .. GENERATED FROM PYTHON SOURCE LINES 80-86 Step 1: Basic Environment Configuration --------------------------------------- We'll create a ChatEnv and configure it with browser automation capabilities. First, we enable list-to-stack conversion for TensorDict, which is required for proper batch handling in LLM environments. .. GENERATED FROM PYTHON SOURCE LINES 86-90 .. code-block:: Python # Enable list-to-stack conversion for TensorDict set_list_to_stack(True).set() .. GENERATED FROM PYTHON SOURCE LINES 91-93 Now we'll create the tokenizer and base environment. The environment requires a batch size, even if we're only running a single instance. .. GENERATED FROM PYTHON SOURCE LINES 93-105 .. code-block:: Python tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") env = ChatEnv( batch_size=(1,), tokenizer=tokenizer, apply_template=True, system_prompt=( "You are a helpful assistant that can use tools to accomplish tasks. " "Tools will be executed and their responses will be added to our conversation." ), ) .. GENERATED FROM PYTHON SOURCE LINES 106-108 Next, we'll add the browser transform with safety configurations. This transform enables web browsing capabilities with domain restrictions for security. .. GENERATED FROM PYTHON SOURCE LINES 108-115 .. code-block:: Python browser_transform = BrowserTransform( allowed_domains=["google.com", "github.com"], headless=False, # Set to False to see the browser actions ) env = env.append_transform(browser_transform) .. GENERATED FROM PYTHON SOURCE LINES 116-121 We can also design a transform to assign rewards to the environment. For example, we can parse the result of the browser transform to assign a reward whenever specific goals are achieved. Very simply, in this example, we will assign a reward of 2 if the LLM finds the answer to the question (Paris), a reward of 1 if it reaches the desired website, and a reward of 0 otherwise. .. GENERATED FROM PYTHON SOURCE LINES 121-188 .. code-block:: Python class RewardTransform(Transform): """A transform that assigns rewards based on the LLM's responses. This transform parses the browser responses in the environment's history and assigns rewards based on specific achievements: - Finding the correct answer (Paris): reward = 2.0 - Successfully reaching Google: reward = 1.0 - Otherwise: reward = 0.0 """ def _call(self, tensordict: TensorDict) -> TensorDict: """Process the tensordict and assign rewards based on the LLM's response. Args: tensordict (TensorDict): The tensordict containing the environment state. Must have a "history" key containing the conversation history. Returns: TensorDict: The tensordict with an added "reward" key containing the computed reward with shape (B, 1) where B is the batch size. """ # ChatEnv has created a history item. We just pick up the last item, # and check if `"Paris"` is in the response. # We use index 0 because we are in a single-instance environment. history = tensordict[0]["history"] last_item = history[-1] if "Paris" in last_item.content: torchrl_logger.info("Found the answer to the question: Paris") # Recall that rewards have a trailing singleton dimension. tensordict["reward"] = torch.full((1, 1), 2.0) # Check if we successfully reached the website elif ( "google.com" in last_item.content and "executed successfully" in last_item.content ): torchrl_logger.info("Reached the website google.com") tensordict["reward"] = torch.full((1, 1), 1.0) else: tensordict["reward"] = torch.full((1, 1), 0.0) return tensordict def transform_reward_spec(self, reward_spec: CompositeSpec) -> CompositeSpec: """Transform the reward spec to include our custom reward. This method is required to override the reward spec since the environment is initially reward-agnostic. Args: reward_spec (CompositeSpec): The original reward spec from the environment. Returns: CompositeSpec: The transformed reward spec with our custom reward definition. The reward will have shape (B, 1) where B is the batch size. """ reward_spec["reward"] = Unbounded( shape=reward_spec.shape + (1,), dtype=torch.float32 ) return reward_spec # We append the reward transform to the environment. env = env.append_transform(RewardTransform()) .. GENERATED FROM PYTHON SOURCE LINES 189-194 Step 2: Tool Execution Helper ----------------------------- To make our interaction with tools more organized, we'll create a helper function that executes tool actions and displays the results. .. GENERATED FROM PYTHON SOURCE LINES 194-217 .. code-block:: Python def execute_tool_action( env: ChatEnv, current_state: TensorDict, action: str, verbose: bool = True, ) -> tuple[TensorDict, TensorDict]: """Execute a tool action and show the formatted interaction.""" s = current_state.set("text_response", [action]) s, s_ = env.step_and_maybe_reset(s) if verbose: print("\nLLM Action:") print("-----------") print(action) print("\nEnvironment Response:") print("--------------------") torchrl_logger.info(s_["history"].apply_chat_template(tokenizer=env.tokenizer)) return s, s_ .. GENERATED FROM PYTHON SOURCE LINES 218-225 Step 3: Starting the Interaction -------------------------------- Let's begin by initializing the environment with a question and navigating to a search engine. Note that the tensordict used as input to the environment must share the same batch size as the environment. The text query is put in a list of length 1, such that it is compatible with the environment's batch size. .. GENERATED FROM PYTHON SOURCE LINES 225-233 .. code-block:: Python reset = env.reset( TensorDict( text=["What is the capital of France?"], batch_size=(1,), ) ) .. GENERATED FROM PYTHON SOURCE LINES 234-238 Now we'll navigate to Google using the browser transform. The transform expects actions in a specific JSON format wrapped in tool tags. In practice, this action should be the output of our LLM which will write the response string in the `"text_response"` key. .. GENERATED FROM PYTHON SOURCE LINES 238-253 .. code-block:: Python s, s_ = execute_tool_action( env, reset, """ Let me search for that: browser { "action": "navigate", "url": "https://google.com" } <|im_end|> """, ) .. GENERATED FROM PYTHON SOURCE LINES 254-259 Step 4: Performing the Search ----------------------------- With the browser open, we can now type our query and execute the search. First, we'll type the search query into Google's search box. .. GENERATED FROM PYTHON SOURCE LINES 259-275 .. code-block:: Python s, s_ = execute_tool_action( env, s_, """ Let me type the search query: browser { "action": "type", "selector": "[name='q']", "text": "What is the capital of France?" } <|im_end|> """, ) .. GENERATED FROM PYTHON SOURCE LINES 276-278 Next, we'll click the search button to execute the search. Note how we use CSS selectors to identify elements on the page. .. GENERATED FROM PYTHON SOURCE LINES 278-293 .. code-block:: Python s, s_ = execute_tool_action( env, s_, """ Now let me click the search button: browser { "action": "click", "selector": "[name='btnK']" } <|im_end|> """, ) .. GENERATED FROM PYTHON SOURCE LINES 294-299 Step 5: Extracting Results -------------------------- Finally, we'll extract the search results from the page. The browser transform can extract both text content and HTML from specified elements. .. GENERATED FROM PYTHON SOURCE LINES 299-315 .. code-block:: Python s, s_ = execute_tool_action( env, s_, """ Let me extract the results: browser { "action": "extract", "selector": "#search", "extract_type": "text" } <|im_end|> """, ) .. GENERATED FROM PYTHON SOURCE LINES 316-317 Let's close the environment. .. GENERATED FROM PYTHON SOURCE LINES 317-319 .. code-block:: Python env.close() .. GENERATED FROM PYTHON SOURCE LINES 320-337 Conclusion ---------- This tutorial demonstrates how to build and compose LLM environments with tool capabilities in TorchRL. We've shown how to create a complete environment that can execute tools, format responses, and handle interactions between the LLM and external tools. The key concepts are: 1. Understanding TorchRL's LLM environment composition 2. Creating and appending tool transforms 3. Formatting tool responses and LLM interactions 4. Handling tool execution and state management 5. Integrating with LLM wrappers (vLLM, Transformers) See the :ref:`ref_llms` tutorial for more information on how to build tool-enabled environments with TorchRL. .. _sphx_glr_download_tutorials_llm_browser.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: llm_browser.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: llm_browser.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: llm_browser.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_