Rate this Page
โ˜… โ˜… โ˜… โ˜… โ˜…

Workflows#

Created On: Jan 29, 2026 | Last Updated On: Jan 29, 2026

This page provides an overview of the quantization and training workflows available in torchao.

Stable Workflows#

๐ŸŸข = stable, ๐ŸŸก = prototype, ๐ŸŸ  = planned, โšช = not supported

recommended hardware

weight

activation

quantized training

QAT

PTQ data algorithms

quantized inference

H100, B200 GPUs

float8 rowwise

float8 rowwise

๐ŸŸข (link)

๐ŸŸข (link)

โšช

๐ŸŸข (link)

Intelยฎ BMG GPUs

float8 tensor/rowwise

float8 tensor/rowwise

๐ŸŸ 

๐ŸŸข (link)

โšช

๐ŸŸข (link)

H100 GPUs

int4

float8 rowwise

โšช

๐ŸŸข (link)

๐ŸŸ 

๐ŸŸข (link)

A100 GPUs

int4

bfloat16

โšช

๐ŸŸข (link)

๐ŸŸก: HQQ, AWQ

๐ŸŸข (link)

Intelยฎ BMG GPUs

int4

float16/bfloat16

โšช

๐ŸŸข (link)

๐ŸŸก: AWQ

๐ŸŸข (link)

A100 GPUs

int8

bfloat16

โšช

๐ŸŸข (link)

โšช

๐ŸŸข (link)

A100 GPUs

int8

int8

๐ŸŸก (link)

๐ŸŸข (link)

โšช

๐ŸŸข (link)

Intelยฎ BMG GPUs

int8

int8

๐ŸŸ 

๐ŸŸข (link)

โšช

๐ŸŸข (link)

edge

intx (1..7)

bfloat16

โšช

๐ŸŸข (link)

โšช

๐ŸŸข (link)

Prototype Workflows#

๐ŸŸข = stable, ๐ŸŸก = prototype, ๐ŸŸ  = planned, โšช = not supported

recommended hardware

weight

activation

quantized training

QAT

PTQ data algorithms

quantized inference

B200, MI350x GPUs

mxfp8

mxfp8

๐ŸŸก (dense), (moe)

โšช

โšช

๐ŸŸก (link)

B200 GPUs

nvfp4

nvfp4

๐ŸŸ 

๐ŸŸก (link)

โšช

๐ŸŸก (link)

B200, MI350x GPUs

mxfp4

mxfp4

โšช not supported

๐ŸŸ 

๐ŸŸ 

๐ŸŸก (link)

H100

float8 128x128 (blockwise)

float8 1x128

๐ŸŸ 

โšช

โšช

๐ŸŸก

Quantization-Aware Training (QAT)#

See the QAT documentation for details on how to use quantization-aware training to improve model accuracy after quantization.

Other#