Workflows#
Created On: Jan 29, 2026 | Last Updated On: Jan 29, 2026
This page provides an overview of the quantization and training workflows available in torchao.
Stable Workflows#
๐ข = stable, ๐ก = prototype, ๐ = planned, โช = not supported
recommended hardware |
weight |
activation |
quantized training |
QAT |
PTQ data algorithms |
quantized inference |
|---|---|---|---|---|---|---|
H100, B200 GPUs |
float8 rowwise |
float8 rowwise |
๐ข (link) |
๐ข (link) |
โช |
๐ข (link) |
Intelยฎ BMG GPUs |
float8 tensor/rowwise |
float8 tensor/rowwise |
๐ |
๐ข (link) |
โช |
๐ข (link) |
H100 GPUs |
int4 |
float8 rowwise |
โช |
๐ข (link) |
๐ |
๐ข (link) |
A100 GPUs |
int4 |
bfloat16 |
โช |
๐ข (link) |
๐ข (link) |
|
Intelยฎ BMG GPUs |
int4 |
float16/bfloat16 |
โช |
๐ข (link) |
๐ก: AWQ |
๐ข (link) |
A100 GPUs |
int8 |
bfloat16 |
โช |
๐ข (link) |
โช |
๐ข (link) |
A100 GPUs |
int8 |
int8 |
๐ก (link) |
๐ข (link) |
โช |
๐ข (link) |
Intelยฎ BMG GPUs |
int8 |
int8 |
๐ |
๐ข (link) |
โช |
๐ข (link) |
edge |
intx (1..7) |
bfloat16 |
โช |
๐ข (link) |
โช |
๐ข (link) |
Prototype Workflows#
๐ข = stable, ๐ก = prototype, ๐ = planned, โช = not supported
recommended hardware |
weight |
activation |
quantized training |
QAT |
PTQ data algorithms |
quantized inference |
|---|---|---|---|---|---|---|
B200, MI350x GPUs |
mxfp8 |
mxfp8 |
โช |
โช |
๐ก (link) |
|
B200 GPUs |
nvfp4 |
nvfp4 |
๐ |
๐ก (link) |
โช |
๐ก (link) |
B200, MI350x GPUs |
mxfp4 |
mxfp4 |
โช not supported |
๐ |
๐ |
๐ก (link) |
H100 |
float8 128x128 (blockwise) |
float8 1x128 |
๐ |
โช |
โช |
๐ก |
Quantization-Aware Training (QAT)#
See the QAT documentation for details on how to use quantization-aware training to improve model accuracy after quantization.
Other#
Sparsity README.md, includes different techniques such as 2:4 sparsity and block sparsity
the prototype folder for other prototype features