Rate this Page

Int4WeightOnlyConfig#

class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, set_inductor_config: bool = True, int4_packing_format: Int4PackingFormat = Int4PackingFormat.PLAIN, int4_choose_qparams_algorithm: Int4ChooseQParamsAlgorithm = Int4ChooseQParamsAlgorithm.TINYGEMM, int4_tile_packed_ntile: int = 8, version: int = 2)[source][source]#

Configuration for int4 weight only quantization, only groupwise quantization is supported.

Parameters:
  • group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32]

  • int4_packing_format – the packing format for int4 tensor

  • int4_choose_qparams_algorithm – variants of choose qparams algorithm to use for int4, currently support TINYGEMM (“tinygemm”) and HQQ (“hqq”)

  • set_inductor_config – if True, adjusts torchinductor settings to recommended values.

  • int4_tile_packed_ntile – ntile size for TILED_PACKED_TO_4D format, default is 8 for CUDA platform, 16 for ROCm platform

Example:

import torch.nn as nn

from torchao.quantization import Int4WeightOnlyConfig, quantize_

model = nn.Sequential(nn.Linear(2048, 2048, device="cuda"))

config = Int4WeightOnlyConfig(
    group_size=32,
    int4_packing_format="tile_packed_to_4d",
    int4_choose_qparams_algorithm="hqq",
)

quantize_(model, config)