• Docs >
  • torchao.quantization.qat
Shortcuts

torchao.quantization.qat

QAT Configs for quantize_

For a full example of how to use QAT with our main quantize_ API, please refer to the QAT README.

IntXQuantizationAwareTrainingConfig

Config for applying fake quantization to a torch.nn.Module.

FromIntXQuantizationAwareTrainingConfig

Config for converting a model with fake quantized modules, such as FakeQuantizedLinear() and FakeQuantizedEmbedding(), back to model with the original, corresponding modules without fake quantization.

Custom QAT APIs

FakeQuantizeConfig

Config for how to fake quantize weights or activations.

FakeQuantizedLinear

General linear layer with fake quantized weights and activations.

FakeQuantizedEmbedding

General embedding layer with fake quantized weights.

FakeQuantizer

Generic module for applying fake quantization to a tensor, as specified in the config.

linear.enable_linear_fake_quant

Helper function to enable fake quantization in FakeQuantizedLinear.

linear.disable_linear_fake_quant

Helper function to disable fake quantization in FakeQuantizedLinear.

Legacy QAT Quantizers

Int4WeightOnlyQATQuantizer

Quantizer for performing QAT on a model, where linear layers have int4 fake quantized grouped per channel weights.

linear.Int4WeightOnlyQATLinear

This module implements a linear layer with int4 fake quantized grouped per channel weights, with forward numerics matching WeightOnlyInt4Linear, which uses the efficient int4 tinygemm kernel.

Int8DynActInt4WeightQATQuantizer

Quantizer for performing QAT on a model, where linear layers have int8 dynamic per token fake quantized activations and int4 fake quantized grouped per channel weights.

linear.Int8DynActInt4WeightQATLinear

This module implements a linear layer with int8 dynamic per token fake quantized activations with int4 fake quantized grouped per channel weights.

Int4WeightOnlyEmbeddingQATQuantizer

Quantizer for performing QAT on a model, where embedding layers have int4 fake quantized grouped per channel weights.

embedding.Int4WeightOnlyQATEmbedding

This module implements a embedding layer with int4 fake quantized grouped per channel weights.

embedding.Int4WeightOnlyEmbedding

This module implements a embedding layer with int4 quantized grouped per channel weights.

Float8ActInt4WeightQATQuantizer

QAT quantizer for applying dynamic rowwise float8 activation + int4 per group/channel symmetric weight fake quantization to linear layers in the model.

ComposableQATQuantizer

Composable quantizer that users can use to apply multiple QAT quantizers easily.

Prototype

initialize_fake_quantizers

(Prototype) Initialize the scales and zero points on all FakeQuantizer in the model based on the provided example inputs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources