Int8DynamicActivationIntxWeightConfig#
- class torchao.quantization.Int8DynamicActivationIntxWeightConfig(weight_dtype: dtype = torch.int8, weight_granularity: Granularity = PerGroup(group_size=32), weight_mapping_type: MappingType = MappingType.SYMMETRIC, weight_scale_dtype: Optional[dtype] = None, act_mapping_type: MappingType = MappingType.ASYMMETRIC, intx_packing_format: IntxPackingFormat = IntxPackingFormat.UNPACKED_TO_INT8, intx_choose_qparams_algorithm: IntxChooseQParamsAlgorithm = IntxChooseQParamsAlgorithm.AFFINE, version: int = 2)[source][source]#
Configuration for dynamically quantizing activations to torch.int8 and weights to torch.intx, with 1 <= x <= 8. More specifically, activations are dynamically quantized to 8-bits at a per-token granularity with scales/zeros. Weights are quantized with scales/zeros in a groupwise or channelwise manner using the number of bits specified by weight_dtype.
This layout is identical to Int8DynamicActivationInt4WeightConfig when weight_dtype is torch.int4 and other args are the same. However, this layout is more general and supports other weight dtypes.
- Parameters
weight_dtype –
The dtype to use for weight quantization. Must be torch.intx, where 1 <= x <= 8.
- ` weight_granularity`: The granularity to use for weight quantization. Must be PerGroup or PerAxis(axis=0).
- weight_mapping_type: The type of mapping to use for the weight quantization.
Must be one of MappingType.ASYMMETRIC or MappingType.SYMMETRIC. MappingType.SYMMETRIC requires ZeroPointDomain.NONE
weight_scale_dtype: The dtype to use for the weight scale. act_mapping_type: The type of mapping to use for the activation quantization.
Must be one of MappingType.ASYMMETRIC or MappingType.SYMMETRIC.
- intx_packing_format: The format to use for the packed weight tensor (version 2 only).
unpacked_to_int8: this format is the default and is intended for export applications like ExecuTorch.
opaque_torchao_auto: this format is optimized for CPU performance.
intx_choose_qparams_algorithm: The algorithm to use for choosing the quantization parameters. version: version of the config to use, only subset of above args are valid based on version, see note for more details.