Rate this Page

Partitioner API#

The Neutron partitioner API allows for configuration of the model delegation to Neutron. Passing an NeutronPartitioner instance with no additional parameters will run as much of the model as possible on the Neutron backend. This is the most common use-case.

It has the following arguments:

  • compile_spec - list of key-value pairs defining compilation,

  • neutron_target_spec - NeutronTargetSpec instance, initialized by SoC id, e.g. “imxrt700”,

  • custom_delegation_options - custom options for specifying node delegation,

  • preserve_ops - list of aten operators to not be decomposed by ExecuTorch.

Compile Spec Options#

To generate the Compile Spec for Neutron backend, you can use the generate_neutron_compile_spec function or directly the NeutronCompileSpecBuilder().neutron_compile_spec() Following fields can be set:

  • config - NXP platform defining the Neutron NPU configuration, e.g. “imxrt700”.

  • extra_flags - Extra flags for the Neutron compiler.

  • operators_not_to_delegate - List of operators that will not be delegated.

  • use_neutron_for_format_conversion - If True, let the eIQ Neutron NPU to handle conversion between channel-first (NCHW) and channel-last (NHWC) data formats. That is the Neutron backend will insert Transpose ops to ensure that the IO matches the executorch partition, which will be delegated to Neutron.

  • fetch_constants_to_sram: If True, the Neutron Converter will insert microinstructions to prefetch weights from FLASH to SRAM. This should be used when the whole model does not fit into SRAM on Neutron-C devices, like i.MX RT700

  • dump_kernel_selection_code: Whether Neutron converter dumps kernel selection code, which is used by the selective kernel registration, see Neutron Firmware Kernel Selection support.

Custom Delegation Options#

By default the Neutron backend is defensive, what means it does not delegate operators which cannot be decided statically during partitioning. But as the model author you typically have insight into the model and so you can allow opportunistic delegation for some cases. For list of options, see CustomDelegationOptions

Operator Support#

Operators are the building blocks of the ML model. See IRs for more information on the PyTorch operator set.

This section lists the Edge operators supported by the Neutron backend. For detailed constraints of the operators see the conditions in the is_supported_* functions in the Node converters

Operator Support#

Operator

Compute DType

Quantization

Constraints

aten.abs.default

int8

static int8

aten._adaptive_avg_pool2d.default

int8

static int8

ceil_mode=False, count_include_pad=False, divisor_override=False

aten.addmm.default

int8

static int8

2D tensor only

aten.add.Tensor

int8

static int8

alpha = 1, input tensors of equal shape

aten.avg_pool1d.default

int8

static int8

ceil_mode=False, count_include_pad=False, divisor_override=False

aten.avg_pool2d.default

int8

static int8

ceil_mode=False, count_include_pad=False, divisor_override=False

aten.bmm.default

int8

static int8

width and channels dim of both args %8 = 0, 3D tensors only

aten.cat.default

int8

static int8

input_channels % 8 = 0, output_channels %8 = 0

aten.clamp.default

int8

static int8

Bounds = (-1, 1) or (0, 1) or (0, 6) or (0, None)

aten.clone.default

int8

static int8

aten.constant_pad_nd.default

int8

static int8

H or W padding only

aten.convolution.default

int8

static int8

1D or 2D convolution, constant weights, groups=1 or groups=channels_count (depthwise)

aten.dim_order_ops._clone_dim_order.default

See aten.clone.default

aten.div.Tensor

int8

static int8

divisor - static tensor or scalar value, one dimension must satisfy %8 = 0 or scalar division (all dims = 1)

aten.hardtanh.default

int8

static int8

supported ranges: <0,6>, <-1, 1>, <0,1>, <0,inf>

aten.leaky_relu.default

int8

static int8

aten.max_pool1d.default

int8

static int8

dilation=1, ceil_mode=False, channels%8=0, batch_size=1, stride_h=1 or 2

aten.max_pool2d.default

int8

static int8

dilation=1, ceil_mode=False, channels%8=0, batch_size=1, stride_h=1 or 2

aten.max_pool2d_with_indices.default

int8

static int8

dilation=1, ceil_mode=False, channels%8=0, batch_size=1, stride_h=1 or 2

aten.mean.dim

int8

static int8

4D tensor only, dims = [-1,-2] or [-2,-1]

aten.mul.Tensor

int8

static int8

tensor-size % 8 = 0

aten.mm.default

int8

static int8

2D tensor only

aten.neg.default

int8

static int8

aten.permute_copy.default

int8

static int8

Only specific transpositions supported, see backends/nxp/backend/ir/converter/node_converters/ops_converters/permute_copy_converter.py

aten.prelu.default

int8

static int8

rank = 4, channels % 8 = 0, flat input size / channels <= 4096

aten.relu.default

int8

static int8

aten.sigmoid.default

int8

static int8

aten.slice_copy.Tensor

int8

static int8

aten._softmax.default

int8

static int8

rank > 1, channels % 8 = 0, channels < 2048, flat input size / channels <= 4096, flat input size <= 524288

aten.split.default

N/A

N/A

transforming split -> getitem to slice, see aten.slice_copy.Tensor

aten.split.Tensor

N/A

N/A

transforming split -> getitem to slice, see aten.slice_copy.Tensor

aten.split_with_sizes.default

N/A

N/A

transforming split -> getitem to slice, see aten.slice_copy.Tensor

aten.squeeze.default

int8

static int8

aten.squeeze.dim

int8

static int8

aten.squeeze.dims

int8

static int8

aten.tanh.default

int8

static int8

aten.unsqueeze.default

int8

static int8

aten.upsample_bilinear2d.vec

int8

static int8

channels % 8 = 0, H_scale = W_scale = 2 or 4

aten.upsample_nearest2d.vec

int8

static int8

channels % 8 = 0, H_scale = W_scale = 2 or 4

aten.view_copy.default

int8

static int8