Rate this Page

Workflows#

Created On: Feb 06, 2026 | Last Updated On: Feb 06, 2026

This page provides an overview of the various workflows available in torchao.

Workflow overview by training/QAT/inference#

Workflows status by dtype + hardware#

🟢 = stable, 🟡 = prototype, 🟠 = planned, ⚪ = not supported

NVIDIA CUDA#

recommended hardware

weight

activation

quantized training

QAT

PTQ data algorithms

quantized inference

H100, B200

float8 rowwise

float8 rowwise

🟢 (link)

🟢 (link)

🟢 (link)

H100

int4

float8 rowwise

🟢 (link)

🟠

🟢 (link)

A100

int4

bfloat16

🟢 (link)

🟡: HQQ, AWQ

🟢 (link)

A100

int8

bfloat16

🟢 (link)

🟢 (link)

A100

int8

int8

🟡 (link)

🟢 (link)

🟢 (link)

B200

nvfp4

nvfp4

🟠

🟡 (link)

🟡 (link)

B200

mxfp8

mxfp8

🟡 (dense), (moe)

🟡 (link)

B200

mxfp4

mxfp4

⚪ not supported

🟠

🟠

🟡 (link)

H100

float8 128x128 (blockwise)

float8 1x128

🟠

🟡

Edge#

recommended hardware

weight

activation

quantized training

QAT

PTQ data algorithms

quantized inference

edge

intx (1..7)

bfloat16

🟢 (link)

🟢 (link)

ROCM#

recommended hardware

weight

activation

quantized training

QAT

PTQ data algorithms

quantized inference

MI350x

mxfp8

mxfp8

🟡 (dense), (moe)

🟡 (link)

MI350x

mxfp4

mxfp4

⚪ not supported

🟠

🟠

🟡 (link)

Intel#

recommended hardware

weight

activation

quantized training

QAT

PTQ data algorithms

quantized inference

Intel® BMG

float8 tensor/rowwise

float8 tensor/rowwise

🟠

🟢 (link)

🟢 (link)

Intel® BMG

int4

float16/bfloat16

🟢 (link)

🟡: AWQ

🟢 (link)

Intel® BMG

int8

int8

🟠

🟢 (link)

🟢 (link)

Other#