• Docs >
  • torchao.quantization.qat
Shortcuts

torchao.quantization.qat

Main Config for quantize_

For a full example of how to use QAT with our main quantize_ API, please refer to the QAT README.

QATConfig

Config for applying quantization-aware training (QAT) to a torch.nn.Module, to be used with quantize_().

QATStep

Enum value for the step field in QATConfig.

Custom QAT APIs

FakeQuantizeConfigBase

Base class for representing fake quantization config.

IntxFakeQuantizeConfig

Config for how to fake quantize weights or activations, targeting integer dtypes up to torch.int8.

Float8FakeQuantizeConfig

Config for float8 fake quantization, targeting Float8Tensor.

FakeQuantizedLinear

General linear layer with fake quantized weights and activations.

FakeQuantizedEmbedding

General embedding layer with fake quantized weights.

FakeQuantizerBase

Generic module for applying fake quantization to a tensor, as specified in the config.

IntxFakeQuantizer

Generic module for applying integer fake quantization to a tensor, as specified in the config.

Float8FakeQuantizer

Generic module for applying float8 fake quantization to a tensor, as specified in the config.

linear.enable_linear_fake_quant

Helper function to enable fake quantization in FakeQuantizedLinear.

linear.disable_linear_fake_quant

Helper function to disable fake quantization in FakeQuantizedLinear.

Legacy QAT APIs

IntXQuantizationAwareTrainingConfig

(Deprecated) Please use QATConfig instead.

FromIntXQuantizationAwareTrainingConfig

(Deprecated) Please use QATConfig instead.

Int4WeightOnlyQATQuantizer

Quantizer for performing QAT on a model, where linear layers have int4 fake quantized grouped per channel weights.

linear.Int4WeightOnlyQATLinear

This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel.

Int8DynActInt4WeightQATQuantizer

Quantizer for performing QAT on a model, where linear layers have int8 dynamic per token fake quantized activations and int4 fake quantized grouped per channel weights.

linear.Int8DynActInt4WeightQATLinear

This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights.

Int4WeightOnlyEmbeddingQATQuantizer

Quantizer for performing QAT on a model, where embedding layers have int4 fake quantized grouped per channel weights.

embedding.Int4WeightOnlyQATEmbedding

This module implements a embedding layer with int4 fake quantized grouped per channel weights.

embedding.Int4WeightOnlyEmbedding

This module implements a embedding layer with int4 quantized grouped per channel weights.

Float8ActInt4WeightQATQuantizer

QAT quantizer for applying dynamic rowwise float8 activation + int4 per group/channel symmetric weight fake quantization to linear layers in the model.

ComposableQATQuantizer

Composable quantizer that users can use to apply multiple QAT quantizers easily.

Prototype

initialize_fake_quantizers

(Prototype) Initialize the scales and zero points on all IntxFakeQuantizerBase in the model based on the provided example inputs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources