# Workflows This page provides an overview of the various workflows available in torchao. ## Workflow overview by training/QAT/inference * Training: our main training workflow is [float8 quantized training](training.md). We also have three prototype quantized training workflows: [mxfp8 dense](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-training), [mxfp8 MoE](https://github.com/pytorch/ao/tree/main/torchao/prototype/moe_training#mxfp8-moe-training), [int8 dense](https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training) * QAT: the [QAT documentation](qat.md) for details on how to use quantization-aware training to improve model accuracy after quantization. * Inference: See the [inference quantization documentation](inference.md) for an overview of quantization for inference workflows. ## Workflows status by dtype + hardware ๐ŸŸข = stable, ๐ŸŸก = prototype, ๐ŸŸ  = planned, โšช = not supported ### NVIDIA CUDA | recommended hardware | weight | activation | quantized training | QAT | PTQ data algorithms | quantized inference | | -------- | ------ | ---------- | ------------------ | --- | ------------------- | ------------------- | | H100, B200 | float8 rowwise | float8 rowwise | ๐ŸŸข [(link)](training.md) | ๐ŸŸข [(link)](qat.md) | โšช | ๐ŸŸข [(link)](inference.md) | | H100 | int4 | float8 rowwise | โšช | ๐ŸŸข [(link)](qat.md) | ๐ŸŸ  | ๐ŸŸข [(link)](https://github.com/pytorch/ao/blob/257d18ae1b41e8bd8d85849dd2bd43ad3885678e/torchao/quantization/quant_api.py#L1296) | | A100 | int4 | bfloat16 | โšช | ๐ŸŸข [(link)](qat.md) | ๐ŸŸก: [HQQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/hqq/README.md), [AWQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/awq) | ๐ŸŸข [(link)](inference.md) | | A100 | int8 | bfloat16 | โšช | ๐ŸŸข [(link)](qat.md) | โšช | ๐ŸŸข [(link)](inference.md) | | A100 | int8 | int8 | ๐ŸŸก [(link)](https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training) | ๐ŸŸข [(link)](qat.md) | โšช | ๐ŸŸข [(link)](inference.md) | | B200 | nvfp4 | nvfp4 | ๐ŸŸ  | ๐ŸŸก [(link)](https://github.com/pytorch/ao/blob/main/torchao/prototype/qat/nvfp4.py) | โšช | ๐ŸŸก [(link)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-inference) | | B200 | mxfp8 | mxfp8 | ๐ŸŸก [(dense)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-training), [(moe)](https://github.com/pytorch/ao/tree/main/torchao/prototype/moe_training) | โšช | โšช | ๐ŸŸก [(link)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-inference) | | B200 | mxfp4 | mxfp4 | โšช not supported | ๐ŸŸ  | ๐ŸŸ  | ๐ŸŸก [(link)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-inference) | | H100 | float8 128x128 (blockwise) | float8 1x128 | ๐ŸŸ  | โšช | โšช | ๐ŸŸก | ### Edge | recommended hardware | weight | activation | quantized training | QAT | PTQ data algorithms | quantized inference | | -------- | ------ | ---------- | ------------------ | --- | ------------------- | ------------------- | | edge | intx (1..7) | bfloat16 | โšช | ๐ŸŸข [(link)](qat.md) | โšช | ๐ŸŸข [(link)](https://github.com/pytorch/ao/blob/257d18ae1b41e8bd8d85849dd2bd43ad3885678e/torchao/quantization/quant_api.py#L2267) | ### ROCM | recommended hardware | weight | activation | quantized training | QAT | PTQ data algorithms | quantized inference | | -------- | ------ | ---------- | ------------------ | --- | ------------------- | ------------------- | | MI350x | mxfp8 | mxfp8 | ๐ŸŸก [(dense)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-training), [(moe)](https://github.com/pytorch/ao/tree/main/torchao/prototype/moe_training) | โšช | โšช | ๐ŸŸก [(link)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-inference) | | MI350x | mxfp4 | mxfp4 | โšช not supported | ๐ŸŸ  | ๐ŸŸ  | ๐ŸŸก [(link)](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats#mx-inference) | ### Intel | recommended hardware | weight | activation | quantized training | QAT | PTQ data algorithms | quantized inference | | -------- | ------ | ---------- | ------------------ | --- | ------------------- | ------------------- | | Intelยฎ BMG | float8 tensor/rowwise | float8 tensor/rowwise |๐ŸŸ  | ๐ŸŸข [(link)](qat.md) | โšช | ๐ŸŸข [(link)](inference.md) | | Intelยฎ BMG | int4 | float16/bfloat16 | โšช | ๐ŸŸข [(link)](qat.md) | ๐ŸŸก: [AWQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/awq) | ๐ŸŸข [(link)](inference.md) | | Intelยฎ BMG | int8 | int8 | ๐ŸŸ  | ๐ŸŸข [(link)](qat.md) | โšช | ๐ŸŸข [(link)](inference.md) | ### Other * [Sparsity README.md](https://github.com/pytorch/ao/tree/main/torchao/sparsity/README.md), includes different techniques such as 2:4 sparsity and block sparsity * [the prototype folder](https://github.com/pytorch/ao/tree/main/torchao/prototype) for other prototype features ```{toctree} :hidden: :maxdepth: 1 training qat inference ```