Rate this Page

Int8DynActInt4WeightQATQuantizer#

class torchao.quantization.qat.Int8DynActInt4WeightQATQuantizer(groupsize: int = 256, padding_allowed: bool = False, precision: dtype = torch.float32, scales_precision: dtype = torch.float32)[source][source]#

Quantizer for performing QAT on a model, where linear layers have int8 dynamic per token fake quantized activations and int4 fake quantized grouped per channel weights.