Shortcuts

GemliteUIntXWeightOnlyConfig

class torchao.quantization.GemliteUIntXWeightOnlyConfig(group_size: Optional[int] = 128, bit_width: int = 4, packing_bitwidth: Optional[int] = None, mode: Optional[str] = 'weight_only', set_inductor_config: bool = True)[source]

applies weight only 4 or 8 bit integer quantization and utilizes the gemlite triton kernel and its associated weight packing format. This only works for fp16 models. 8 bit quantization is symmetric, 4 bit quantization is asymmetric.

Parameters:
  • group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained

  • bit_width – bit width of the quantized weight.

  • packing_bitwidth – bit width of the packed weight, should be 8 or 32. Can have performance impacts depending on hardware.

  • mode – if set to “dynamic”, activations are quantized at runtime; default is “weight_only” (weight-only quantization).

  • set_inductor_config – if True, adjusts torchinductor settings to recommended values.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources