Int4WeightOnlyConfig#
- class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, set_inductor_config: bool = True, int4_packing_format: Int4PackingFormat = Int4PackingFormat.PLAIN, int4_choose_qparams_algorithm: Int4ChooseQParamsAlgorithm = Int4ChooseQParamsAlgorithm.TINYGEMM, int4_tile_packed_ntile: int = 8, version: int = 2)[source][source]#
Configuration for int4 weight only quantization, only groupwise quantization is supported.
- Parameters:
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32]
int4_packing_format – the packing format for int4 tensor
int4_choose_qparams_algorithm – variants of choose qparams algorithm to use for int4, currently support TINYGEMM (“tinygemm”) and HQQ (“hqq”)
set_inductor_config – if True, adjusts torchinductor settings to recommended values.
int4_tile_packed_ntile – ntile size for TILED_PACKED_TO_4D format, default is 8 for CUDA platform, 16 for ROCm platform
Example:
import torch.nn as nn from torchao.quantization import Int4WeightOnlyConfig, quantize_ model = nn.Sequential(nn.Linear(2048, 2048, device="cuda")) config = Int4WeightOnlyConfig( group_size=32, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq", ) quantize_(model, config)