Int4WeightOnlyConfig#
- class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, set_inductor_config: bool = True, int4_packing_format: Int4PackingFormat = Int4PackingFormat.PLAIN, int4_choose_qparams_algorithm: Int4ChooseQParamsAlgorithm = Int4ChooseQParamsAlgorithm.TINYGEMM, version: int = 2)[source][source]#
Configuration for int4 weight only quantization, only groupwise quantization is supported right now, and we support version 1 and version 2, that are implemented differently although with same support. In version 2, different target are mainly distinguished by packing_format arg, and in version 1, mainly by layout.
- Parameters
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32], used in both version 1 and 2
int4_packing_format – the packing format for int4 tensor, used in version 2 only int4_choose_qparams_algorithm: variants of choose qparams algorithm to use for int4, currently support TINYGEMM (“tinygemm”) and HQQ (“hqq”), used in version 2 only
set_inductor_config – if True, adjusts torchinductor settings to recommended values. used in both version 1 and 2
version – version of the config to use, default is 2