Rate this Page

Int4WeightOnlyQATLinear#

class torchao.quantization.qat.linear.Int4WeightOnlyQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, inner_k_tiles: int = 8, precision: dtype = torch.bfloat16, scales_precision: dtype = torch.bfloat16)[source][source]#

This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel.

Parameters
  • groupsize – the number of elements in each quantized group for weights

  • precision – precision of weights

  • scales_precision – precision of per group scales and zero points