Rate this Page

Quantization#

The Exynos backend currently supports executing statically quantized 8-bit models.

8-bit quantization with the PT2E quantization flow#

To perform 8-bit quantization with the PT2E flow, perform the following steps prior to exporting the model:

  1. Create an instance of the EnnQuantizer class and set the desired quantization behaviour.

  2. Use torch.export.export to obtain a graph module representation of the source model.

  3. Use prepare_pt2e to prepare the model for quantization.

  4. Execute the prepared model with representative samples to calibrate the quantizated tensor activation ranges.

  5. Use convert_pt2e to quantize the model.

  6. Export and lower the model using the standard export flow.

The output of convert_pt2e is a PyTorch model which can be exported and lowered using the same export flow as non-quantized models. As it is a regular PyTorch model, it can also be used to evaluate the accuracy of the quantized model using standard PyTorch techniques.

The below example shows how to quantize a MobileNetV2 model using the PT2E quantization flow.

import torch
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights

from executorch.backends.samsung.partition.enn_partitioner import EnnPartitioner
from executorch.backends.samsung.quantizer.quantizer import EnnQuantizer, Precision

from executorch.exir import to_edge_transform_and_lower
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e

model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

# Currently, "A8W8" is the only supported precision mode
precision = "A8W8"
is_per_channel = True
is_qat = False

quantizer = EnnQuantizer()
quantizer.set_quant_params(precision, is_per_channel, is_qat) # (1)

training_ep = torch.export.export(model, sample_inputs).module() # (2)
prepared_model = prepare_pt2e(training_ep, quantizer) # (3)

for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs
	prepared_model(cal_sample) # (4) Calibrate

quantized_model = convert_pt2e(prepared_model) # (5)

et_program = to_edge_transform_and_lower( # (6)
    torch.export.export(quantized_model, sample_inputs),
    partitioner=[EnnPartitioner()],
).to_executorch()

See PyTorch 2 Export Post Training Quantization for more information.