GemliteUIntXWeightOnlyConfig¶

class torchao.quantization.GemliteUIntXWeightOnlyConfig(group_size: Optional[int] = 128, bit_width: int = 4, packing_bitwidth: Optional[int] = None, mode: Optional[str] = 'weight_only', set_inductor_config: bool = True)[source]¶

applies weight only 4 or 8 bit integer quantization and utilizes the gemlite triton kernel and its associated weight packing format. This only works for fp16 models. 8 bit quantization is symmetric, 4 bit quantization is asymmetric.

Parameters:

group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained
bit_width – bit width of the quantized weight.
packing_bitwidth – bit width of the packed weight, should be 8 or 32. Can have performance impacts depending on hardware.
mode – if set to “dynamic”, activations are quantized at runtime; default is “weight_only” (weight-only quantization).
set_inductor_config – if True, adjusts torchinductor settings to recommended values.

GemliteUIntXWeightOnlyConfig¶

Docs

Tutorials

Resources