NXP eIQ Neutron Quantization#

The eIQ Neutron NPU requires the operators delegated to be quantized. To quantize the PyTorch model for the Neutron backend, use the NeutronQuantizer from backends/nxp/quantizer/neutron_quantizer.py. The NeutronQuantizer is configured to quantize the model with quantization scheme supported by the eIQ Neutron NPU.

Supported Quantization Schemes#

The Neutron delegate supports the following quantization schemes:

Static quantization with 8-bit symmetric weights and 8-bit asymmetric activations (via the PT2E quantization flow), per-tensor granularity.
- Following operators are supported at this moment:
  - aten.abs.default
  - aten.adaptive_avg_pool2d.default
  - aten.addmm.default
  - aten.add.Tensor
  - aten.avg_pool2d.default
  - aten.cat.default
  - aten.conv1d.default
  - aten.conv2d.default
  - aten.dropout.default
  - aten.flatten.using_ints
  - aten.hardtanh.default
  - aten.hardtanh_.default
  - aten.linear.default
  - aten.max_pool2d.default
  - aten.mean.dim
  - aten.mul.Tensor
  - aten.pad.default
  - aten.permute.default
  - aten.relu.default and aten.relu_.default
  - aten.reshape.default
  - aten.view.default
  - aten.softmax.int
  - aten.tanh.default, aten.tanh_.default
  - aten.sigmoid.default
  - aten.slice_copy.Tensor

Static 8-bit Quantization Using the PT2E Flow#

To perform 8-bit quantization with the PT2E flow, perform the following steps prior to exporting the model to edge:

Create an instance of the NeutronQuantizer class.
Use torch.export.export to export the model to ATen Dialect.
Call prepare_pt2e with the instance of the NeutronQuantizer to annotate the model with observers for quantization.
As static quantization is required, run the prepared model with representative samples to calibrate the quantized tensor activation ranges.
Call convert_pt2e to quantize the model.
Export and lower the model using the standard flow.

The output of convert_pt2e is a PyTorch model which can be exported and lowered using the normal flow. As it is a regular PyTorch model, it can also be used to evaluate the accuracy of the quantized model using standard PyTorch techniques.

To quantize the model, you can use the PT2E workflow:

import torch
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer
from executorch.backends.nxp.backend.neutron_target_spec import NeutronTargetSpec
from executorch.backends.nxp.neutron_partitioner import NeutronPartitioner
from executorch.backends.nxp.nxp_backend import generate_neutron_compile_spec
from executorch.exir import to_edge_transform_and_lower
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e

model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

target_spec = NeutronTargetSpec(target="imxrt700", converter_flavor="SDK_25_12")
quantizer = NeutronQuantizer(neutron_target_spec) # (1)

training_ep = torch.export.export(model, sample_inputs).module() # (2)
prepared_model = prepare_pt2e(training_ep, quantizer) # (3)

for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs
	prepared_model(cal_sample) # (4) Calibrate

quantized_model = convert_pt2e(prepared_model) # (5)

compile_spec = generate_neutron_compile_spec(
    "imxrt700",
    operators_not_to_delegate=None,
    neutron_converter_flavor="SDK_25_12",
)

et_program = to_edge_transform_and_lower( # (6)
    torch.export.export(quantized_model, sample_inputs),
    partitioner=[NeutronPartitioner(compile_spec=compile_spec)],
).to_executorch()

Or you can use the predefined function for post training quantization from NXP Backend implementation:

from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer
from executorch.backends.nxp.backend.neutron_target_spec import NeutronTargetSpec
from executorch.backends.nxp.quantizer.utils import calibrate_and_quantize

...

target_spec = NeutronTargetSpec(target="imxrt700", converter_flavor="SDK_25_12")
quantized_graph_module = calibrate_and_quantize(
    aten_model,
    calibration_inputs,
    NeutronQuantizer(neutron_target_spec=target_spec),
)

See PyTorch 2 Export Post Training Quantization for more information.

NXP eIQ Neutron Quantization#

Supported Quantization Schemes#

Static 8-bit Quantization Using the PT2E Flow#

Docs

Tutorials

Resources