.. _finetune_llama_label: ======================= Finetune your First LLM ======================= This guide will walk you through the process of launching your first finetuning job using TorchTune. .. grid:: 2 .. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn * How to download a model and convert it to a format compatible with Torchtune * How to modify a recipe's parameters * How to finetune a model .. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites * Be familiar with the :ref:`overview of TorchTune` * Make sure to :ref:`install TorchTune` .. _download_llama_label: Downloading a model ------------------- First, you need to download a model. TorchTune's supports an integration with the `Hugging Face Hub `_ - a collection of the latest and greatest model weights. For this tutorial, you're going to use the `Llama2 model from Meta `_. Llama2 is a "gated model", meaning that you need to be granted access in order to download the weights. Follow `these instructions `_ on the official Meta page hosted on Hugging Face to complete this process. (This should take less than 5 minutes.) To verify that you have the access, go to the `model page `_. You should be able to see the model files. If not, you may need to accept the agreement to complete the signup process. Once you have authorization, you will need to authenticate with Hugging Face Hub. The easiest way to do so is to provide an access token to the download script. You can find your token `here `_. Then, it's as simple as: .. code-block:: bash tune download \ meta-llama/Llama-2-7b \ --output-dir /tmp/llama2 \ --hf-token This command will also download the model tokenizer and some other helpful files such as a Responsible Use guide. .. note:: You can also download the model directly through the Llama2 repository. See `this page `_ for more details. Selecting a recipe ------------------ Recipes are the primary entry points for TorchTune users. These can be thought of as end-to-end pipelines for training and optionally evaluating LLMs. Each recipe consists of three components: * **Configurable parameters**, specified through yaml configs, command-line overrides and dataclasses * **Recipe class**, core logic needed for training, exposed to users through a set of APIs * **Recipe script**, puts everything together including parsing and validating configs, setting up the environment, and correctly using the recipe class To see all available recipes and for more information on how to select the right recipe, see the :ref:`recipe_deepdive` tutorial. For this tutorial, you'll be using the :ref:`basic full finetuning recipe`. Modifying a config ------------------ YAML configs hold most of the important information needed for running your recipe. You can set hyperparameters, specify metric loggers like `WandB `_, select a new dataset, and more. For a list of all currently supported datasets, see :ref:`datasets`. To modify an existing recipe config, you can use the :code:`tune` CLI to copy it to your local directory. Or, you can visit the specific :ref:`recipe page` and copy/paste the config from there. It looks like there's already a config called :code:`alpaca_llama_full_finetune` that utilizes the popular `Alpaca instruction dataset `_. This seems like a good place to start so let's copy it! .. code-block:: bash tune cp llama2/7B_full custom_config.yaml Now you can update the custom YAML config to point to your model and tokenizer. While you're at it, you can make some other changes, like setting the random seed in order to make replication easier, lowering the epochs to 1 so you can see results sooner, and updating the learning rate. .. code-block:: yaml # Tokenizer tokenizer: _component_: torchtune.models.llama2.llama2_tokenizer path: /tmp/tokenizer.model # Dataset dataset: _component_: torchtune.datasets.alpaca_dataset seed: 42 shuffle: True # Model Arguments model: _component_: torchtune.models.llama2.llama2_7b checkpointer: _component_: torchtune.utils.FullModelMetaCheckpointer checkpoint_dir: /tmp/llama2 checkpoint_files: [consolidated.00.pth] recipe_checkpoint: null output_dir: /tmp/llama2 model_type: LLAMA2 resume_from_checkpoint: False # Fine-tuning arguments batch_size: 2 epochs: 1 optimizer: _component_: torch.optim.SGD lr: 1e-5 loss: _component_: torch.nn.CrossEntropyLoss output_dir: /tmp/alpaca-llama2-finetune device: cuda dtype: bf16 enable_activation_checkpointing: True Training a model ---------------- Now that you have a model in the proper format and a config that suits your needs, let's get training! Just like all the other steps, you will be using the :code:`tune` CLI tool to launch your finetuning run. To make it easier for users already familiar with the PyTorch ecosystem, TorchTune integrates with `torchrun `_. Therefore, in order to launch a distributed run using two GPUs, it's as easy as: .. code-block:: bash tune run --nnodes 1 --nproc_per_node 2 full_finetune_distributed --config custom_config.yaml You should see some immediate output and see the loss going down, indicating your model is training succesfully. .. code-block:: text Writing logs to /tmp/alpaca-llama2-finetune/log_1707246452.txt Setting manual seed to local seed 42. Local seed is seed + rank = 42 + 0 Model is initialized. FSDP and Activation Checkpointing are enabled. Tokenizer is initialized from file. Optimizer is initialized. Loss is initialized. Dataset and Sampler are initialized. 1|1|Loss: 1.7553404569625854: 0%| | 0/13000 [00:03` at the full fine-tuning recipe and understand the config better.