Rate this Page

Quick Start Pathway#

This pathway is for engineers who want to get a model running on a device as quickly as possible. It assumes you are familiar with PyTorch model development and have some prior exposure to mobile or edge deployment concepts. Steps are kept concise and link directly to the most actionable documentation.

Estimated time to first inference: 15–30 minutes.


Choose Your Scenario#

Select the scenario that most closely matches what you are trying to accomplish right now.

🚀 I have a PyTorch model and want to run it on device

Fastest path: Export → Run

  1. Install: pip install executorch

  2. Export with Getting Started with ExecuTorch (Exporting section)

  3. Run with Python runtime or deploy to Android / iOS

Time: ~15 min

📦 I want to use a pre-exported model

Fastest path: Download → Run

Pre-exported .pte files for Llama 3.2, MobileNet, and other models are available on HuggingFace ExecuTorch Community.

Skip export entirely and go directly to the runtime section of Getting Started with ExecuTorch.

Time: ~10 min

🤗 I have a HuggingFace model

Fastest path: Optimum ExecuTorch

Use the optimum-executorch CLI for a one-command export of HuggingFace models.

See Exporting LLMs with HuggingFace’s Optimum ExecuTorch for installation and usage.

Time: ~20 min

🦙 I want to run Llama on my phone

Fastest path: Llama on ExecuTorch

Follow the Llama on ExecuTorch guide for the complete Llama export and deployment workflow, including quantization and platform-specific setup.

Time: ~45 min (model download included)


The 5-Minute Setup#

If you have not yet installed ExecuTorch, run the following in a Python 3.10–3.13 virtual environment:

pip install executorch

Then verify the installation with a minimal export:

import torch
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

# Define a simple model
class Add(torch.nn.Module):
    def forward(self, x, y):
        return x + y

model = Add()
sample_inputs = (torch.ones(1), torch.ones(1))

et_program = to_edge_transform_and_lower(
    torch.export.export(model, sample_inputs),
    partitioner=[XnnpackPartitioner()]
).to_executorch()

with open("add.pte", "wb") as f:
    f.write(et_program.buffer)

print("Export successful: add.pte created")

If this runs without error, your environment is correctly configured.


Quick Reference: Export Cheat Sheet#

Task

Code / Command

Install ExecuTorch

pip install executorch

Export with XNNPACK (mobile CPU)

to_edge_transform_and_lower(torch.export.export(model, inputs), partitioner=[XnnpackPartitioner()])

Export with Core ML (iOS)

Replace XnnpackPartitioner with CoreMLPartitioner — see Core ML Backend

Export with Qualcomm (Android NPU)

See Qualcomm AI Engine Backend for QNN SDK setup and partitioner usage

Run from Python

Runtime.get().load_program("model.pte").load_method("forward").execute([input])

Run from C++

See Running an ExecuTorch Model Using the Module Extension in C++ for the high-level Module API

Export an LLM

python -m executorch.examples.models.llama.export_llm ... — see Exporting LLMs


Platform Quick Start Guides#

Jump directly to the platform-specific setup guide for your target.

Android Quick Start

Gradle dependency, Java Module API, and XNNPACK / Vulkan / Qualcomm backend selection for Android.

Android
iOS Quick Start

Swift Package Manager setup, Objective-C runtime API, and Core ML / MPS / XNNPACK backend selection for iOS.

iOS
Desktop / Linux / macOS

Python runtime, C++ CMake integration, and XNNPACK / Core ML / MPS backends for desktop platforms.

Desktop & Laptop Platforms
Embedded Systems

Bare-metal and RTOS deployment, Arm Ethos-U, Cadence, NXP, and other embedded backends.

Embedded Systems

Backend Selection Guide#

Choosing the right backend has the largest impact on performance. Use this table to select the appropriate backend for your hardware.

Backend Selection by Platform and Hardware#

Platform

Hardware Target

Backend

Documentation

Android

CPU (Arm/x86)

XNNPACK

XNNPACK Backend

Android

GPU (Vulkan)

Vulkan

Vulkan Backend

Android

Qualcomm NPU/DSP

QNN

Qualcomm AI Engine Backend

Android

MediaTek APU

MediaTek

MediaTek Backend

iOS / macOS

Neural Engine / GPU

Core ML

Core ML Backend

iOS / macOS

Metal GPU

MPS

MPS Backend

iOS / macOS

CPU (Arm)

XNNPACK

XNNPACK Backend

Desktop

Intel CPU/GPU/NPU

OpenVINO

Building and Running ExecuTorch with OpenVINO Backend

Desktop

Apple Silicon

Core ML / MPS

<no title>

Embedded

Arm Cortex-M / Ethos-U

Arm Ethos-U

<no title>

Embedded

Cadence DSP

Cadence

Cadence Xtensa Backend

Embedded

NXP eIQ Neutron

NXP

NXP eIQ Neutron Backend


Troubleshooting Quick Fixes#

Symptom

Quick Fix

ImportError: No module named executorch

Run pip install executorch in your active virtual environment

Export fails with torch._dynamo error

Ensure your model is export-compatible; see Exporting to ExecuTorch

.pte file runs but produces wrong output

Use Developer Tools Usage Tutorials to compare intermediate activations

Android Gradle sync fails

Check executorch_version in build.gradle.kts matches your installed version

iOS build fails with missing xcframework

Verify the Swift PM branch name matches your ExecuTorch version (format: swiftpm-X.Y.Z)


Going Deeper#

Once your model is running, explore these topics to optimize performance and expand capabilities.