uintx_weight_only¶

torchao.quantization.uintx_weight_only(dtype, group_size=64, pack_dim=- 1, use_hqq=False)[source]¶

Applies uintx weight-only asymmetric per-group quantization to linear layers, using uintx quantization where x is the number of bits specified by dtype

Parameters:

dtype – torch.uint1 to torch.uint7 sub byte dtypes
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained, defaults to 64
pack_dim – the dimension we use for packing, defaults to -1
use_hqq – whether to use hqq algorithm or the default algorithm to quantize the weight

uintx_weight_only¶

Docs

Tutorials

Resources