NXP eIQ Neutron Backend#
This manual page is dedicated to introduction of using the ExecuTorch with NXP eIQ Neutron Backend. NXP offers accelerated machine learning models inference on edge devices. To learn more about NXP’s machine learning acceleration platform, please refer to the official NXP website.
Features#
ExecuTorch v1.0 supports running machine learning models on selected NXP chips (for now only i.MXRT700). Among currently supported machine learning models are:
Convolution-based neutral networks
Full support for MobileNetV2 and CifarNet
Prerequisites (Hardware and Software)#
In order to successfully build ExecuTorch project and convert models for NXP eIQ Neutron Backend you will need a computer running Linux.
If you want to test the runtime, you’ll also need:
Hardware with NXP’s i.MXRT700 chip or a testing board like MIMXRT700-AVK
Using NXP backend#
To test converting a neural network model for inference on NXP eIQ Neutron Backend, you can use our example script:
# cd to the root of executorch repository
./examples/nxp/aot_neutron_compile.sh [model (cifar10 or mobilenetv2)]
For a quick overview how to convert a custom PyTorch model, take a look at our example python script.
Partitioner API#
The partitioner is defined in NeutronPartitioner
in backends/nxp/neutron_partitioner.py
. It has the following
arguments:
compile_spec
- list of key-value pairs defining compilation. E.g. for specifying platform (i.MXRT700) and Neutron Converter flavor.custom_delegation_options
- custom options for specifying node delegation.
Quantization#
The quantization for Neutron Backend is defined in NeutronQuantizer
in backends/nxp/quantizer/neutron_quantizer.py
.
The quantization follows PT2E workflow, INT8 quantization is supported. Operators are quantized statically, activations
follow affine and weights symmetric per-tensor quantization scheme.
Supported operators#
List of Aten operators supported by Neutron quantizer:
abs
, adaptive_avg_pool2d
, addmm
, add.Tensor
, avg_pool2d
, cat
, conv1d
, conv2d
, dropout
,
flatten.using_ints
, hardtanh
, hardtanh_
, linear
, max_pool2d
, mean.dim
, pad
, permute
, relu
, relu_
,
reshape
, view
, softmax.int
, sigmoid
, tanh
, tanh_
Example#
import torch
from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
# Prepare your model in Aten dialect
aten_model = get_model_in_aten_dialect()
# Prepare calibration inputs, each tuple is one example, example tuple has items for each model input
calibration_inputs: list[tuple[torch.Tensor, ...]] = get_calibration_inputs()
quantizer = NeutronQuantizer()
m = prepare_pt2e(aten_model, quantizer)
for data in calibration_inputs:
m(*data)
m = convert_pt2e(m)
Runtime Integration#
To learn how to run the converted model on the NXP hardware, use one of our example projects on using ExecuTorch runtime from MCUXpresso IDE example projects list. For more finegrained tutorial, visit this manual page.