Rate this Page

★ ★ ★ ★ ★

torchao.quantization.qat#

Created On: Jan 08, 2026 | Last Updated On: Jan 08, 2026

Main Config for quantize_#

For a full example of how to use QAT with our main quantize_ API, please refer to the QAT README.

`QATConfig`	Config for applying quantization-aware training (QAT) to a torch.nn.Module, to be used with `quantize_()`.
`QATStep`	Enum value for the step field in `QATConfig`.

Custom QAT APIs#

`FakeQuantizeConfigBase`	Base class for representing fake quantization config.
`IntxFakeQuantizeConfig`	Config for how to fake quantize weights or activations, targeting integer dtypes up to torch.int8.
`Float8FakeQuantizeConfig`	Config for float8 fake quantization, targeting `Float8Tensor`.
`FakeQuantizedLinear`	General linear layer with fake quantized weights and activations.
`FakeQuantizedEmbedding`	General embedding layer with fake quantized weights.
`FakeQuantizerBase`	Generic module for applying fake quantization to a tensor, as specified in the config.
`IntxFakeQuantizer`	Generic module for applying integer fake quantization to a tensor, as specified in the config.
`Float8FakeQuantizer`	Generic module for applying float8 fake quantization to a tensor, as specified in the config.
`linear.enable_linear_fake_quant`	Helper function to enable fake quantization in FakeQuantizedLinear.
`linear.disable_linear_fake_quant`	Helper function to disable fake quantization in FakeQuantizedLinear.

Legacy QAT APIs#

`IntXQuantizationAwareTrainingConfig`	(Deprecated) Please use `QATConfig` instead.
`FromIntXQuantizationAwareTrainingConfig`	(Deprecated) Please use `QATConfig` instead.
`Int4WeightOnlyQATQuantizer`	Quantizer for performing QAT on a model, where linear layers have int4 fake quantized grouped per channel weights.
`linear.Int4WeightOnlyQATLinear`	This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel.
`Int8DynActInt4WeightQATQuantizer`	Quantizer for performing QAT on a model, where linear layers have int8 dynamic per token fake quantized activations and int4 fake quantized grouped per channel weights.
`linear.Int8DynActInt4WeightQATLinear`	This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights.
`Int4WeightOnlyEmbeddingQATQuantizer`	Quantizer for performing QAT on a model, where embedding layers have int4 fake quantized grouped per channel weights.
`embedding.Int4WeightOnlyQATEmbedding`	This module implements a embedding layer with int4 fake quantized grouped per channel weights.
`embedding.Int4WeightOnlyEmbedding`	This module implements a embedding layer with int4 quantized grouped per channel weights.
`Float8ActInt4WeightQATQuantizer`	QAT quantizer for applying dynamic rowwise float8 activation + int4 per group/channel symmetric weight fake quantization to linear layers in the model.
`ComposableQATQuantizer`	Composable quantizer that users can use to apply multiple QAT quantizers easily.

Prototype#

initialize_fake_quantizers

(Prototype) Initialize the scales and zero points on all IntxFakeQuantizerBase in the model based on the provided example inputs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources