Shortcuts

Int4WeightOnlyQATLinear

class torchao.quantization.qat.linear.Int4WeightOnlyQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, inner_k_tiles: int = 8, precision: dtype = torch.bfloat16, scales_precision: dtype = torch.bfloat16)[source]

This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel.

Parameters:
  • groupsize – the number of elements in each quantized group for weights

  • precision – precision of weights

  • scales_precision – precision of per group scales and zero points

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources