Partitioner API#
The Neutron partitioner API allows for configuration of the model delegation to Neutron. Passing an NeutronPartitioner instance with no additional parameters will run as much of the model as possible on the Neutron backend. This is the most common use-case.
It has the following arguments:
compile_spec - list of key-value pairs defining compilation:
custom_delegation_options - custom options for specifying node delegation.
Compile Spec Options#
To generate the Compile Spec for Neutron backend, you can use the generate_neutron_compile_spec function or directly the NeutronCompileSpecBuilder().neutron_compile_spec() Following fields can be set:
config - NXP platform defining the Neutron NPU configuration, e.g. “imxrt700”.
extra_flags - Extra flags for the Neutron compiler.
operators_not_to_delegate - List of operators that will not be delegated.
Custom Delegation Options#
By default the Neutron backend is defensive, what means it does not delegate operators which cannot be decided statically during partitioning. But as the model author you typically have insight into the model and so you can allow opportunistic delegation for some cases. For list of options, see CustomDelegationOptions
Operator Support#
Operators are the building blocks of the ML model. See IRs for more information on the PyTorch operator set.
This section lists the Edge operators supported by the Neutron backend.
For detailed constraints of the operators see the conditions in the is_supported_* functions in the Node converters
Operator |
Compute DType |
Quantization |
Constraints |
|---|---|---|---|
aten.abs.default |
int8 |
static int8 |
|
aten._adaptive_avg_pool2d.default |
int8 |
static int8 |
ceil_mode=False, count_include_pad=False, divisor_override=False |
aten.addmm.default |
int8 |
static int8 |
2D tensor only |
aten.add.Tensor |
int8 |
static int8 |
alpha = 1, input tensors of equal shape |
aten.avg_pool1d.default |
int8 |
static int8 |
ceil_mode=False, count_include_pad=False, divisor_override=False |
aten.avg_pool2d.default |
int8 |
static int8 |
ceil_mode=False, count_include_pad=False, divisor_override=False |
aten.cat.default |
int8 |
static int8 |
input_channels % 8 = 0, output_channels %8 = 0 |
aten.clamp.default |
int8 |
static int8 |
Bounds = (-1, 1) or (0, 1) or (0, 6) or (0, None) |
aten.clone.default |
int8 |
static int8 |
|
aten.constant_pad_nd.default |
int8 |
static int8 |
H or W padding only |
aten.convolution.default |
int8 |
static int8 |
1D or 2D convolution, constant weights, groups=1 or groups=channels_count (depthwise) |
aten.div.Tensor |
int8 |
static int8 |
divisor - static tensor or scalar value, one dimension must satisfy %8 = 0 or scalar division (all dims = 1) |
aten.hardtanh.default |
int8 |
static int8 |
supported ranges: <0,6>, <-1, 1>, <0,1>, <0,inf> |
aten.leaky_relu.default |
int8 |
static int8 |
|
aten.max_pool1d.default |
int8 |
static int8 |
dilation=1, ceil_mode=False, channels%8=0, batch_size=1, stride_h=1 or 2 |
aten.max_pool2d.default |
int8 |
static int8 |
dilation=1, ceil_mode=False, channels%8=0, batch_size=1, stride_h=1 or 2 |
aten.max_pool2d_with_indices.default |
int8 |
static int8 |
dilation=1, ceil_mode=False, channels%8=0, batch_size=1, stride_h=1 or 2 |
aten.mean.dim |
int8 |
static int8 |
4D tensor only, dims = [-1,-2] or [-2,-1] |
aten.mul.Tensor |
int8 |
static int8 |
tensor-size % 8 = 0 |
aten.mm.default |
int8 |
static int8 |
2D tensor only |
aten.neg.default |
int8 |
static int8 |
|
aten.relu.default |
int8 |
static int8 |
|
aten.sigmoid.default |
int8 |
static int8 |
|
aten.slice_copy.Tensor |
int8 |
static int8 |
|
aten.softmax.default |
int8 |
static int8 |
rank > 1, channels % 8 = 0, channels < 2048, flat input size / channels <= 4096, flat input size <= 524288 |
aten.squeeze.default |
int8 |
static int8 |
|
aten.squeeze.dim |
int8 |
static int8 |
|
aten.squeeze.dims |
int8 |
static int8 |
|
aten.tanh.default |
int8 |
static int8 |
|
aten.unsqueeze.default |
int8 |
static int8 |
|
aten.upsample_bilinear2d.vec |
int8 |
static int8 |
channels % 8 = 0, H_scale = W_scale = 2 or 4 |
aten.upsample_nearest2d.vec |
int8 |
static int8 |
channels % 8 = 0, H_scale = W_scale = 2 or 4 |
aten.view_copy.default |
int8 |
static int8 |