Int4WeightOnlyConfig¶
- class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, layout: Optional[TensorCoreTiledLayout] = TensorCoreTiledLayout(inner_k_tiles=8), use_hqq: bool = False, zero_point_domain: Optional[ZeroPointDomain] = ZeroPointDomain.NONE, set_inductor_config: bool = True, preserve_zero: Optional[bool] = None, int4_packing_format: Int4PackingFormat = Int4PackingFormat.PLAIN, int4_choose_qparams_algorithm: Int4ChooseQParamsAlgorithm = Int4ChooseQParamsAlgorithm.TINYGEMM, version: int = 2)[source]¶
Configuration for int4 weight only quantization, only groupwise quantization is supported right now, and we support version 1 and version 2, that are implemented differently although with same support. In version 2, different target are mainly distinguished by packing_format arg, and in version 1, mainly by layout.
- Parameters:
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32], used in both version 1 and 2
int4_packing_format – the packing format for int4 tensor, used in version 2 only int4_choose_qparams_algorithm: variants of choose qparams algorithm to use for int4, currently support TINYGEMM (“tinygemm”) and HQQ (“hqq”), used in version 2 only
layout – layout type for quantized tensor, default is TensorCoreTiledLayout(inner_k_tiles=8), used in version 1 only
use_hqq – whether to use hqq or default quantization mode, default is False, used in version 1 only
zero_point_domain – data type of zeros points, choices are [ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE], used in version 1 only
set_inductor_config – if True, adjusts torchinductor settings to recommended values. used in both version 1 and 2
preserve_zero – whether to preserve zero, default is None. Will be set to True if zero_point_domain is ZeroPointDomain.INT, used in version 1 only
version – version of the config to use, only subset of above args are valid for version 1, and subset of above args are valid for version 2, default is 2, see note for more details
Note
Current state for Int4WeightOnlyConfig is that it supports both v1 (legacy) and v2
For v2 (version = 2), only group_size, int4_packing_format, int4_choose_qparams_algorithm and set_inductor_config are valid, all other args will be ignored For v1 (version = 1), only group_size, layout, use_hqq, zero_point_domain, preserve_zero and set_inductor_config are valid, we plan to deprecate v1 in torchao 0.15 to make this config less confusing