Quantization#

The Arm VGF delegate can be used to execute quantized models. To quantize a model so that is supported by this delegate, the VgfQuantizer should be used.

Currently the symmetric int8 config defined by executorch.backends.arm.quantizer.arm_quantizer.get_symmetric_quantization_config is the main config available to use with the VGF quantizer.

Supported Quantization Schemes#

The quantization schemes supported by the VGF Backend are:

8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow).
- Supports both static and dynamic activations
- Supports per-channel and per-tensor schemes

Weight-only quantization is not currently supported on the VGF backend.

Quantization API#

class VgfQuantizer(compile_spec: 'VgfCompileSpec') -> 'None'

Quantizer supported by the Arm Vgf backend.

Args:

compile_spec (VgfCompileSpec): Backend compile specification for Vgf targets.

def VgfQuantizer.quantize_with_submodules(self, model: 'GraphModule', calibration_samples: 'list[tuple]', is_qat: 'bool' = False):

Quantizes a GraphModule in a way such that conditional submodules are handled properly.

Args:

model (GraphModule): The model to quantize.
calibration_samples (list[tuple]): A list of inputs to used to calibrate the model during quantization. To properly calibrate a model with submodules, at least one sample per code path is needed.
is_qat (bool): Whether to do quantization aware training or not.

Returns:

GraphModule: The quantized model.

def VgfQuantizer.set_global(self, quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer':

Set quantization_config for submodules not matched by other filters.

Args:

quantization_config (QuantizationConfig): Configuration to apply to modules that are not captured by name or type filters.

def VgfQuantizer.set_io(self, quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer':

Set quantization_config for input and output nodes.

Args:

quantization_config (QuantizationConfig): Configuration describing activation quantization for model inputs and outputs.

def VgfQuantizer.set_module_name(self, module_name: 'str', quantization_config: 'Optional[QuantizationConfig]') -> 'TOSAQuantizer':

Set quantization_config for submodules with a given module name.

For example, calling set_module_name(“blocks.sub”) quantizes supported patterns for that submodule with the provided quantization_config.

Args:

module_name (str): Fully qualified module name to configure.
quantization_config (QuantizationConfig): Configuration to apply to the named submodule.

def VgfQuantizer.set_module_type(self, module_type: 'Callable', quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer':

Set quantization_config for submodules with a given module type.

For example, calling set_module_type(Sub) quantizes supported patterns in each Sub instance with the provided quantization_config.

Args:

module_type (Callable): Type whose submodules should use the provided quantization configuration.
quantization_config (QuantizationConfig): Configuration to apply to submodules of the given type.

def VgfQuantizer.transform_for_annotation(self, model: 'GraphModule') -> 'GraphModule':

Transform the graph to prepare it for quantization annotation.

Currently transforms scalar values to tensor attributes.

Args:

model (GraphModule): Model whose graph will be transformed.

Returns:

GraphModule: Transformed model prepared for annotation.

Quantization#

Supported Quantization Schemes#

Quantization API#

Docs

Tutorials

Resources