IntxWeightOnlyConfig#

class torchao.quantization.IntxWeightOnlyConfig(weight_dtype: dtype = torch.int8, granularity: Granularity = PerAxis(axis=0), mapping_type: MappingType = MappingType.SYMMETRIC, scale_dtype: Optional[dtype] = None, intx_packing_format: IntxPackingFormat = IntxPackingFormat.UNPACKED_TO_INT8, intx_choose_qparams_algorithm: IntxChooseQParamsAlgorithm = IntxChooseQParamsAlgorithm.AFFINE, version: int = 2)[source][source]#

Configuration for quantizing weights to torch.intx, with 1 <= x <= 8. Weights are quantized with scales/zeros in a groupwise or channelwise manner using the number of bits specified by weight_dtype. :param weight_dtype: The dtype to use for weight quantization. Must be torch.intx, where 1 <= x <= 8. :param granularity: The granularity to use for weight quantization. Must be PerGroup or PerAxis(0). :param mapping_type: The type of mapping to use for the weight quantization.

Must be one of MappingType.ASYMMETRIC or MappingType.SYMMETRIC.

Parameters

scale_dtype – The dtype to use for the weight scale.
intx_packing_format – The format to use for the packed weight tensor (version 2 only).
intx_choose_qparams_algorithm – The algorithm to use for choosing the quantization parameters.
version – version of the config to use, only subset of above args are valid based on version, see note for more details.

Example:

import torch.nn as nn

from torchao.quantization import IntxWeightOnlyConfig, quantize_

model = nn.Sequential(nn.Linear(2048, 2048, device="cuda"))
quantize_(model, IntxWeightOnlyConfig())

IntxWeightOnlyConfig#

Docs

Tutorials

Resources