torchao.quantization.qat#
Created On: Dec 17, 2025 | Last Updated On: Dec 17, 2025
Main Config for quantize_#
For a full example of how to use QAT with our main quantize_ API, please refer to the QAT README.
Custom QAT APIs#
Base class for representing fake quantization config. |
|
Config for how to fake quantize weights or activations, targeting integer dtypes up to torch.int8. |
|
Config for float8 fake quantization, targeting |
|
General linear layer with fake quantized weights and activations. |
|
General embedding layer with fake quantized weights. |
|
Generic module for applying fake quantization to a tensor, as specified in the config. |
|
Generic module for applying integer fake quantization to a tensor, as specified in the config. |
|
Generic module for applying float8 fake quantization to a tensor, as specified in the config. |
|
Helper function to enable fake quantization in FakeQuantizedLinear. |
|
Helper function to disable fake quantization in FakeQuantizedLinear. |
Legacy QAT APIs#
(Deprecated) Please use |
|
(Deprecated) Please use |
|
Quantizer for performing QAT on a model, where linear layers have int4 fake quantized grouped per channel weights. |
|
This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel. |
|
Quantizer for performing QAT on a model, where linear layers have int8 dynamic per token fake quantized activations and int4 fake quantized grouped per channel weights. |
|
This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights. |
|
Quantizer for performing QAT on a model, where embedding layers have int4 fake quantized grouped per channel weights. |
|
This module implements a embedding layer with int4 fake quantized grouped per channel weights. |
|
This module implements a embedding layer with int4 quantized grouped per channel weights. |
|
QAT quantizer for applying dynamic rowwise float8 activation + int4 per group/channel symmetric weight fake quantization to linear layers in the model. |
|
Composable quantizer that users can use to apply multiple QAT quantizers easily. |
Prototype#
(Prototype) Initialize the scales and zero points on all |