Rate this Page

IntxWeightOnlyConfig#

class torchao.quantization.IntxWeightOnlyConfig(weight_dtype: dtype = torch.int8, granularity: Granularity = PerAxis(axis=0), mapping_type: MappingType = MappingType.SYMMETRIC, scale_dtype: Optional[dtype] = None, intx_packing_format: IntxPackingFormat = IntxPackingFormat.UNPACKED_TO_INT8, intx_choose_qparams_algorithm: IntxChooseQParamsAlgorithm = IntxChooseQParamsAlgorithm.AFFINE, version: int = 2)[source][source]#

Configuration for quantizing weights to torch.intx, with 1 <= x <= 8. Weights are quantized with scales/zeros in a groupwise or channelwise manner using the number of bits specified by weight_dtype. :param weight_dtype: The dtype to use for weight quantization. Must be torch.intx, where 1 <= x <= 8. :param granularity: The granularity to use for weight quantization. Must be PerGroup or PerAxis(0). :param mapping_type: The type of mapping to use for the weight quantization.

Must be one of MappingType.ASYMMETRIC or MappingType.SYMMETRIC.

Parameters
  • scale_dtype – The dtype to use for the weight scale.

  • intx_packing_format – The format to use for the packed weight tensor (version 2 only).

  • intx_choose_qparams_algorithm – The algorithm to use for choosing the quantization parameters.

  • version – version of the config to use, only subset of above args are valid based on version, see note for more details.