Int8DynActInt4WeightQATLinear#

class torchao.quantization.qat.linear.Int8DynActInt4WeightQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, precision: dtype = torch.float32, scales_precision: dtype = torch.float32)[source][source]#

This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights.

Parameters:

groupsize – the number of elements in each quantized group for weights
precision – precision of weights
scales_precision – precision of per group scales and zero points

Note: we hardcode activation scales to use torch.fp32, but allow users to specify the weight scales (defaults to torch.fp32). Here scales_precision refers specifically to the weight scales only, not the activation scales.

Int8DynActInt4WeightQATLinear#

Docs

Tutorials

Resources