Rate this Page

Int8DynActInt4WeightQATLinear#

class torchao.quantization.qat.linear.Int8DynActInt4WeightQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, precision: dtype = torch.float32, scales_precision: dtype = torch.float32)[source][source]#

This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights.

Parameters
  • groupsize – the number of elements in each quantized group for weights

  • precision – precision of weights

  • scales_precision – precision of per group scales and zero points

Note: we hardcode activation scales to use torch.fp32, but allow users to specify the weight scales (defaults to torch.fp32). To get an exact numerical match with Int8DynamicActivationInt4WeightConfig, users must use the same dtype for both the weights and the scales. Here scales_precision refers specifically to the weight scales only, not the activation scales.