torchao.quantization.qat¶
QAT Configs for quantize_¶
For a full example of how to use QAT with our main quantize_ API, please refer to the QAT README.
Config for applying fake quantization to a torch.nn.Module. |
|
Config for converting a model with fake quantized modules, such as |
Custom QAT APIs¶
Config for how to fake quantize weights or activations. |
|
General linear layer with fake quantized weights and activations. |
|
General embedding layer with fake quantized weights. |
|
Generic module for applying fake quantization to a tensor, as specified in the config. |
|
Helper function to enable fake quantization in FakeQuantizedLinear. |
|
Helper function to disable fake quantization in FakeQuantizedLinear. |
Legacy QAT Quantizers¶
Quantizer for performing QAT on a model, where linear layers have int4 fake quantized grouped per channel weights. |
|
This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel. |
|
Quantizer for performing QAT on a model, where linear layers have int8 dynamic per token fake quantized activations and int4 fake quantized grouped per channel weights. |
|
This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights. |
|
Quantizer for performing QAT on a model, where embedding layers have int4 fake quantized grouped per channel weights. |
|
This module implements a embedding layer with int4 fake quantized grouped per channel weights. |
|
This module implements a embedding layer with int4 quantized grouped per channel weights. |
|
QAT quantizer for applying dynamic rowwise float8 activation + int4 per group/channel symmetric weight fake quantization to linear layers in the model. |
|
Composable quantizer that users can use to apply multiple QAT quantizers easily. |
Prototype¶
(Prototype) Initialize the scales and zero points on all |