uintx_weight_only¶
- torchao.quantization.uintx_weight_only(dtype, group_size=64, pack_dim=- 1, use_hqq=False)[source]¶
Applies uintx weight-only asymmetric per-group quantization to linear layers, using uintx quantization where x is the number of bits specified by dtype
- Parameters:
dtype – torch.uint1 to torch.uint7 sub byte dtypes
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, defaults to 64
pack_dim – the dimension we use for packing, defaults to -1
use_hqq – whether to use hqq algorithm or the default algorithm to quantize the weight