XNNPACK Backend#

The XNNPACK delegate is the ExecuTorch solution for CPU execution on mobile CPUs. XNNPACK is a library that provides optimized kernels for machine learning operators on Arm and x86 CPUs.

Features#

Wide operator support on Arm and x86 CPUs, available on any modern mobile phone.
Support for a wide variety of quantization schemes and quantized operators.
Supports fp32 and fp16 activations.
Supports 8-bit quantization.

Target Requirements#

ARM64 on Android, iOS, macOS, Linux, and Windows.
ARMv7 (with NEON) on Android.
ARMv6 (with VFPv2) on Linux.
x86 and x86-64 (up to AVX512) on Windows, Linux, Android.

Development Requirements#

The XNNPACK delegate does not introduce any development system requirements beyond those required by the core ExecuTorch runtime.

Using the XNNPACK Backend#

To target the XNNPACK backend during the export and lowering process, pass an instance of the XnnpackPartitioner to to_edge_transform_and_lower. The example below demonstrates this process using the MobileNet V2 model from torchvision.

import torch
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower

mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

et_program = to_edge_transform_and_lower(
    torch.export.export(mobilenet_v2, sample_inputs),
    partitioner=[XnnpackPartitioner()],
).to_executorch()

with open("mv2_xnnpack.pte", "wb") as file:
    et_program.write_to_file(file)

See Partitioner API for a reference on available partitioner options.

Quantization#

The XNNPACK delegate can also be used as a backend to execute symmetrically quantized models. See XNNPACK Quantization for more information on available quantization schemes and APIs.

Runtime Integration#

To run the model on-device, use the standard ExecuTorch runtime APIs.

The XNNPACK delegate is included by default in the published Android, iOS, and pip packages. When building from source, pass -DEXECUTORCH_BUILD_XNNPACK=ON when configuring the CMake build to compile the XNNPACK backend. See Running on Device for more information.

To link against the backend, add the executorch_backends CMake target as a build dependency, or link directly against libxnnpack_backend. Due to the use of static registration, it may be necessary to link with whole-archive. This can typically be done by passing "$<LINK_LIBRARY:WHOLE_ARCHIVE,xnnpack_backend>" to target_link_libraries.

# CMakeLists.txt
add_subdirectory("executorch")
...
target_link_libraries(
    my_target
    PRIVATE executorch
    executorch_backends
    ...
)

No additional steps are necessary to use the backend beyond linking the target. Any XNNPACK-delegated .pte file will automatically run on the registered backend.

Reference#

→Troubleshooting — Debug common issues.

→Partitioner API — Partitioner options and supported operators.

→Quantization — Supported quantization schemes.

→Architecture and Internals — XNNPACK backend internals.