IntxWeightOnlyConfig#
- class torchao.quantization.IntxWeightOnlyConfig(weight_dtype: dtype = torch.int8, granularity: Granularity = PerAxis(axis=0), mapping_type: MappingType = MappingType.SYMMETRIC, scale_dtype: Optional[dtype] = None, intx_packing_format: IntxPackingFormat = IntxPackingFormat.UNPACKED_TO_INT8, intx_choose_qparams_algorithm: IntxChooseQParamsAlgorithm = IntxChooseQParamsAlgorithm.AFFINE, version: int = 2)[source][source]#
Configuration for quantizing weights to torch.intx, with 1 <= x <= 8. Weights are quantized with scales/zeros in a groupwise or channelwise manner using the number of bits specified by weight_dtype. :param weight_dtype: The dtype to use for weight quantization. Must be torch.intx, where 1 <= x <= 8. :param granularity: The granularity to use for weight quantization. Must be PerGroup or PerAxis(0). :param mapping_type: The type of mapping to use for the weight quantization.
Must be one of MappingType.ASYMMETRIC or MappingType.SYMMETRIC.
- Parameters
scale_dtype – The dtype to use for the weight scale.
intx_packing_format – The format to use for the packed weight tensor (version 2 only).
intx_choose_qparams_algorithm – The algorithm to use for choosing the quantization parameters.
version – version of the config to use, only subset of above args are valid based on version, see note for more details.