Shortcuts

Int8DynActInt4WeightQATLinear

class torchao.quantization.qat.linear.Int8DynActInt4WeightQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, precision: dtype = torch.float32, scales_precision: dtype = torch.float32)[source]

This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights.

Parameters:
  • groupsize – the number of elements in each quantized group for weights

  • precision – precision of weights

  • scales_precision – precision of per group scales and zero points

Note: we hardcode activation scales to use torch.fp32, but allow users to specify the weight scales (defaults to torch.fp32). To get an exact numerical match with Int8DynamicActivationInt4WeightConfig, users must use the same dtype for both the weights and the scales. Here scales_precision refers specifically to the weight scales only, not the activation scales.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources