Rate this Page

Partitioner API#

The XNNPACK partitioner API allows for configuration of the model delegation to XNNPACK. Passing an XnnpackPartitioner instance with no additional parameters will run as much of the model as possible on the XNNPACK backend. This is the most common use-case. For advanced use cases, the partitioner exposes the following options via the constructor:

  • configs: Control which operators are delegated to XNNPACK. By default, all available operators all delegated. See ../config/__init__.py for an exhaustive list of available operator configs.

  • config_precisions: Filter operators by data type. By default, delegate all precisions. One or more of ConfigPrecisionType.FP32, ConfigPrecisionType.STATIC_QUANT, or ConfigPrecisionType.DYNAMIC_QUANT. See ConfigPrecisionType.

  • per_op_mode: If true, emit individual delegate calls for every operator. This is an advanced option intended to reduce memory overhead in some contexts at the cost of a small amount of runtime overhead. Defaults to false.

  • verbose: If true, print additional information during lowering.

Operator Support#

This section lists the operators supported by the XNNPACK backend. Operators are the building blocks of the ML model. See IRs for more information on the PyTorch operator set.

All operators support dynamic input shapes unless otherwise noted.

Operator Support#

Operator

Compute DType

Quantization

Constraints

_to_dim_order_copy

fp16, fp32

no dtype conversion

abs

fp16, fp32

add

fp16, fp32

PT2E: static int8

alpha=1

avg_pool2d

fp16, fp32

PT2E: static int8

ceil_mode=False, count_include_pad=False, divisor_override=pooling_region

bmm

fp16, fp32

cat

fp16, fp32

PT2E: static int8

ceil

fp16, fp32

clamp

fp16, fp32

constant_pad_nd

fp16, fp32

no negative padding values

conv1d

fp16, fp32

PT2E: static or dynamic int8 activations 8-bit weights, symmetric per-tensor or per-channel

constant weights

conv2d

fp16, fp32

PT2E: static or dynamic int8 activations 8-bit weights, symmetric per-tensor or per-channel

constant weights

dequantize_per_tensor

fp16, fp32

div

fp16, fp32

elu

fp16, fp32

exp

fp16, fp32

floor

fp16, fp32

gelu

fp16, fp32

hardswish

fp16, fp32

hardtanh

fp16, fp32

leaky_relu

fp16, fp32

linear

fp16, fp32

PT2E: static or dynamic int8 activations 8-bit weights, symmetric per-tensor or per-channel

quantize_: 8-bit dynamic activations 4-bit groupwise weights

constant weights

log

fp16, fp32

max_pool2d

fp16, fp32

stride ≤ kernel_size, ceil_mode only for static shapes

maximum

fp16, fp32

mean

fp16, fp32

4D tensors only; dims=[-2,-1] or [-1,-2]

minimum

fp16, fp32

mul

fp16, fp32

PT2E: static int8

neg

fp16, fp32

permute_copy

fp16, fp32

pow

fp16, fp32

power=2 only

quantize_per_tensor

fp16, fp32

relu

fp16, fp32

rsqrt

fp16, fp32

sigmoid

fp16, fp32

slice_copy

fp16, fp32

no zero-dim tensors, no dynamic shapes

softmax

fp16, fp32

dim must be last dimension

sqrt

fp16, fp32

sub

fp16, fp32

alpha=1

tanh

fp16, fp32

upsample_bilinear2d

fp16, fp32

no dynamic output sizes