Workflows#

Created On: Feb 13, 2026 | Last Updated On: Feb 13, 2026

This page provides an overview of the various workflows available in torchao.

Workflow overview by training/QAT/inference#

Training: our main training workflow is float8 quantized training. We also have three prototype quantized training workflows: mxfp8 dense, mxfp8 MoE, int8 dense
QAT: the QAT documentation for details on how to use quantization-aware training to improve model accuracy after quantization.
Inference: See the inference quantization documentation for an overview of quantization for inference workflows.

🟢 = stable, 🟡 = prototype, 🟠 = planned, ⚪ = not supported

NVIDIA CUDA#

recommended hardware	weight	activation	quantized training	QAT	PTQ data algorithms	quantized inference
H100, B200	float8 rowwise	float8 rowwise	🟢 (link)	🟢 (link)	⚪	🟢 (link)
H100	int4	float8 rowwise	⚪	🟢 (link)	🟠	🟢 (link)
A100	int4	bfloat16	⚪	🟢 (link)	🟡: HQQ, AWQ	🟢 (link)
A100	int8	bfloat16	⚪	🟢 (link)	⚪	🟢 (link)
A100	int8	int8	🟡 (link)	🟢 (link)	⚪	🟢 (link)
B200	nvfp4	nvfp4	🟠	🟡 (link)	⚪	🟡 (link)
B200	mxfp8	mxfp8	🟡 (dense), (moe)	⚪	⚪	🟡 (link)
B200	mxfp4	mxfp4	⚪ not supported	🟠	🟠	🟡 (link)
H100	float8 128x128 (blockwise)	float8 1x128	🟠	⚪	⚪	🟡

recommended hardware	weight	activation	quantized training	QAT	PTQ data algorithms	quantized inference
edge	intx (1..7)	bfloat16	⚪	🟢 (link)	⚪	🟢 (link)

recommended hardware	weight	activation	quantized training	QAT	PTQ data algorithms	quantized inference
MI350x	mxfp8	mxfp8	🟡 (dense), (moe)	⚪	⚪	🟡 (link)
MI350x	mxfp4	mxfp4	⚪ not supported	🟠	🟠	🟡 (link)

recommended hardware	weight	activation	quantized training	QAT	PTQ data algorithms	quantized inference
Intel® BMG	float8 tensor/rowwise	float8 tensor/rowwise	🟠	🟢 (link)	⚪	🟢 (link)
Intel® BMG	int4	float16/bfloat16	⚪	🟢 (link)	🟡: AWQ	🟢 (link)
Intel® BMG	int8	int8	🟠	🟢 (link)	⚪	🟢 (link)

Sparsity README.md, includes different techniques such as 2:4 sparsity and block sparsity
the prototype folder for other prototype features