Int4WeightOnlyQATLinear¶

class torchao.quantization.qat.linear.Int4WeightOnlyQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, inner_k_tiles: int = 8, precision: dtype = torch.bfloat16, scales_precision: dtype = torch.bfloat16)[source]¶

This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel.

Parameters:

groupsize – the number of elements in each quantized group for weights
precision – precision of weights
scales_precision – precision of per group scales and zero points

Int4WeightOnlyQATLinear¶

Docs

Tutorials

Resources