Int4WeightOnlyConfig#

class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, set_inductor_config: bool = True, int4_packing_format: Int4PackingFormat = Int4PackingFormat.PLAIN, int4_choose_qparams_algorithm: Int4ChooseQParamsAlgorithm = Int4ChooseQParamsAlgorithm.TINYGEMM, version: int = 2)[source][source]#

Configuration for int4 weight only quantization, only groupwise quantization is supported right now, and we support version 1 and version 2, that are implemented differently although with same support. In version 2, different target are mainly distinguished by packing_format arg, and in version 1, mainly by layout.

Parameters

group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32], used in both version 1 and 2
int4_packing_format – the packing format for int4 tensor, used in version 2 only int4_choose_qparams_algorithm: variants of choose qparams algorithm to use for int4, currently support TINYGEMM (“tinygemm”) and HQQ (“hqq”), used in version 2 only
set_inductor_config – if True, adjusts torchinductor settings to recommended values. used in both version 1 and 2
version – version of the config to use, default is 2

Example:

import torch.nn as nn

from torchao.quantization import Int4WeightOnlyConfig, quantize_

model = nn.Sequential(nn.Linear(2048, 2048, device="cuda"))

config = Int4WeightOnlyConfig(
    group_size=32,
    int4_packing_format="tile_packed_to_4d",
    int4_choose_qparams_algorithm="hqq",
)

quantize_(model, config)

Int4WeightOnlyConfig#

Docs

Tutorials

Resources