Workflows#
Created On: Feb 06, 2026 | Last Updated On: Feb 06, 2026
This page provides an overview of the various workflows available in torchao.
Workflow overview by training/QAT/inference#
Training: our main training workflow is float8 quantized training. We also have three prototype quantized training workflows: mxfp8 dense, mxfp8 MoE, int8 dense
QAT: the QAT documentation for details on how to use quantization-aware training to improve model accuracy after quantization.
Inference: See the inference quantization documentation for an overview of quantization for inference workflows.
Workflows status by dtype + hardware#
🟢 = stable, 🟡 = prototype, 🟠 = planned, ⚪ = not supported
NVIDIA CUDA#
recommended hardware |
weight |
activation |
quantized training |
QAT |
PTQ data algorithms |
quantized inference |
|---|---|---|---|---|---|---|
H100, B200 |
float8 rowwise |
float8 rowwise |
🟢 (link) |
🟢 (link) |
⚪ |
🟢 (link) |
H100 |
int4 |
float8 rowwise |
⚪ |
🟢 (link) |
🟠 |
🟢 (link) |
A100 |
int4 |
bfloat16 |
⚪ |
🟢 (link) |
🟢 (link) |
|
A100 |
int8 |
bfloat16 |
⚪ |
🟢 (link) |
⚪ |
🟢 (link) |
A100 |
int8 |
int8 |
🟡 (link) |
🟢 (link) |
⚪ |
🟢 (link) |
B200 |
nvfp4 |
nvfp4 |
🟠 |
🟡 (link) |
⚪ |
🟡 (link) |
B200 |
mxfp8 |
mxfp8 |
⚪ |
⚪ |
🟡 (link) |
|
B200 |
mxfp4 |
mxfp4 |
⚪ not supported |
🟠 |
🟠 |
🟡 (link) |
H100 |
float8 128x128 (blockwise) |
float8 1x128 |
🟠 |
⚪ |
⚪ |
🟡 |
Edge#
ROCM#
Intel#
Other#
Sparsity README.md, includes different techniques such as 2:4 sparsity and block sparsity
the prototype folder for other prototype features