Int4WeightOnlyConfig¶
- class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, layout: Optional[TensorCoreTiledLayout] = TensorCoreTiledLayout(inner_k_tiles=8), use_hqq: bool = False, zero_point_domain: Optional[ZeroPointDomain] = ZeroPointDomain.NONE, set_inductor_config: bool = True, preserve_zero: Optional[bool] = None)[source]¶
Configuration for applying uint4 weight-only asymmetric per-group quantization to linear layers, using “tensor_core_tiled” layout for speedup with tinygemm kernel
Note
This is targeting tinygemm int4mm kernel (torch.ops.aten._weight_int4pack_mm and torch.ops.aten._weight_int4pack_mm_for_cpu), the main difference of quantization algorithm compared to the more traditional type of integer quantization is the following: 1). zero_point is in floating point domain instead of integer domain (zero_point_domain`=`ZeroPointDomain.FLOAT) 2). floating point zero does not have to be exactly representable (preserve_zero`=False in `choose_qparams_affine) please follow the relevant code in choose_qparams_affine, quantize_affine and dequantize_affine to learn about how the quantization parameters are chosen and how the Tensor is quantized/dequantized for tinygemm
- Parameters:
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32]
layout – layout type for quantized tensor, default is TensorCoreTiledLayout(inner_k_tiles=8)
use_hqq – whether to use hqq or default quantization mode, default is False
zero_point_domain – data type of zeros points, choices are [ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE]
set_inductor_config – if True, adjusts torchinductor settings to recommended values.
preserve_zero – whether to preserve zero, default is None. Will be set to True if zero_point_domain is ZeroPointDomain.INT