XNNPACK Backend#
The XNNPACK delegate is the ExecuTorch solution for CPU execution on mobile CPUs. XNNPACK is a library that provides optimized kernels for machine learning operators on Arm and x86 CPUs.
Features#
Wide operator support on Arm and x86 CPUs, available on any modern mobile phone.
Support for a wide variety of quantization schemes and quantized operators.
Supports fp32 and fp16 activations.
Supports 8-bit quantization.
Target Requirements#
ARM64 on Android, iOS, macOS, Linux, and Windows.
ARMv7 (with NEON) on Android.
ARMv6 (with VFPv2) on Linux.
x86 and x86-64 (up to AVX512) on Windows, Linux, Android.
Development Requirements#
The XNNPACK delegate does not introduce any development system requirements beyond those required by the core ExecuTorch runtime.
Using the XNNPACK Backend#
To target the XNNPACK backend during the export and lowering process, pass an instance of the XnnpackPartitioner
to to_edge_transform_and_lower
. The example below demonstrates this process using the MobileNet V2 model from torchvision.
import torch
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower
mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )
et_program = to_edge_transform_and_lower(
torch.export.export(mobilenet_v2, sample_inputs),
partitioner=[XnnpackPartitioner()],
).to_executorch()
with open("mv2_xnnpack.pte", "wb") as file:
et_program.write_to_file(file)
See Partitioner API for a reference on available partitioner options.
Quantization#
The XNNPACK delegate can also be used as a backend to execute symmetrically quantized models. See XNNPACK Quantization for more information on available quantization schemes and APIs.
Runtime Integration#
To run the model on-device, use the standard ExecuTorch runtime APIs.
The XNNPACK delegate is included by default in the published Android, iOS, and pip packages. When building from source, pass -DEXECUTORCH_BUILD_XNNPACK=ON
when configuring the CMake build to compile the XNNPACK backend. See Running on Device for more information.
To link against the backend, add the executorch_backends
CMake target as a build dependency, or link directly against libxnnpack_backend
. Due to the use of static registration, it may be necessary to link with whole-archive. This can typically be done by passing "$<LINK_LIBRARY:WHOLE_ARCHIVE,xnnpack_backend>"
to target_link_libraries
.
# CMakeLists.txt
add_subdirectory("executorch")
...
target_link_libraries(
my_target
PRIVATE executorch
executorch_backends
...
)
No additional steps are necessary to use the backend beyond linking the target. Any XNNPACK-delegated .pte file will automatically run on the registered backend.
Reference#
→Troubleshooting — Debug common issues.
→Partitioner API — Partitioner options and supported operators.
→Quantization — Supported quantization schemes.
→Architecture and Internals — XNNPACK backend internals.