Rate this Page

Int4WeightOnlyQATQuantizer#

class torchao.quantization.qat.Int4WeightOnlyQATQuantizer(groupsize: int = 256, inner_k_tiles: Optional[int] = 8, precision: dtype = torch.bfloat16, scales_precision: dtype = torch.bfloat16)[source][source]#

Quantizer for performing QAT on a model, where linear layers have int4 fake quantized grouped per channel weights.