Shortcuts

Int4WeightOnlyConfig

class torchao.quantization.Int4WeightOnlyConfig(group_size: int = 128, layout: Optional[TensorCoreTiledLayout] = TensorCoreTiledLayout(inner_k_tiles=8), use_hqq: bool = False, zero_point_domain: Optional[ZeroPointDomain] = ZeroPointDomain.NONE, set_inductor_config: bool = True, preserve_zero: Optional[bool] = None, int4_packing_format: Int4PackingFormat = Int4PackingFormat.PLAIN, int4_choose_qparams_algorithm: Int4ChooseQParamsAlgorithm = Int4ChooseQParamsAlgorithm.TINYGEMM, version: int = 2)[source]

Configuration for int4 weight only quantization, only groupwise quantization is supported right now, and we support version 1 and version 2, that are implemented differently although with same support. In version 2, different target are mainly distinguished by packing_format arg, and in version 1, mainly by layout.

Parameters:
  • group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, choices are [256, 128, 64, 32], used in both version 1 and 2

  • int4_packing_format – the packing format for int4 tensor, used in version 2 only int4_choose_qparams_algorithm: variants of choose qparams algorithm to use for int4, currently support TINYGEMM (“tinygemm”) and HQQ (“hqq”), used in version 2 only

  • layout – layout type for quantized tensor, default is TensorCoreTiledLayout(inner_k_tiles=8), used in version 1 only

  • use_hqq – whether to use hqq or default quantization mode, default is False, used in version 1 only

  • zero_point_domain – data type of zeros points, choices are [ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE], used in version 1 only

  • set_inductor_config – if True, adjusts torchinductor settings to recommended values. used in both version 1 and 2

  • preserve_zero – whether to preserve zero, default is None. Will be set to True if zero_point_domain is ZeroPointDomain.INT, used in version 1 only

  • version – version of the config to use, only subset of above args are valid for version 1, and subset of above args are valid for version 2, default is 2, see note for more details

Note

Current state for Int4WeightOnlyConfig is that it supports both v1 (legacy) and v2

For v2 (version = 2), only group_size, int4_packing_format, int4_choose_qparams_algorithm and set_inductor_config are valid, all other args will be ignored For v1 (version = 1), only group_size, layout, use_hqq, zero_point_domain, preserve_zero and set_inductor_config are valid, we plan to deprecate v1 in torchao 0.15 to make this config less confusing

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources