Deploying LLMs to ExecuTorch¶
ExecuTorch is designed to support all types of machine learning models, and LLMs are no exception. In this section we demonstrate how to leverage ExecuTorch to performantly run state of the art LLMs on-device out of the box with our provided export LLM APIs, acceleration backends, quantization libraries, tokenizers, and more.
We encourage users to use this project as a starting point and adapt it to their specific needs, which includes creating your own versions of the tokenizer, sampler, acceleration backends, and other components. We hope this project serves as a useful guide in your journey with LLMs and ExecuTorch.
Prerequisites¶
To follow this guide, you’ll need to install ExecuTorch. Please see Setting Up ExecuTorch.
Next steps¶
Deploying LLMs to ExecuTorch can be boiled down to a two-step process: (1) exporting the LLM to a .pte
file and (2) running the .pte
file using our C++ APIs or Swift/Java bindings.