Welcome to the torchao Documentation¶
torchao is a library for custom data types and optimizations. Quantize and sparsify weights, gradients, optimizers, and activations for inference and training using native PyTorch. Please checkout torchao README for an overall introduction to the library and recent highlight and updates.
Getting Started
Developer Notes
API Reference
Eager Quantization Tutorials
- (Part 1) Pre-training with float8
- (Part 2) Fine-tuning with QAT, QLoRA, and float8
- (Part 3) Serving on vLLM, SGLang, ExecuTorch
- Integration with VLLM: Architecture and Usage Guide
- Hugging Face Integration
- Serialization
- Static Quantization
- Writing Your Own Quantized Tensor
- Writing Your Own Quantized Tensor (advanced)
PT2E Quantization Tutorials
- PyTorch 2 Export Post Training Quantization
- PyTorch 2 Export Quantization-Aware Training (QAT)
- PyTorch 2 Export Quantization with X86 Backend through Inductor
- PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
- PyTorch 2 Export Quantization for OpenVINO torch.compile Backend
- How to Write a
Quantizer
for PyTorch 2 Export Quantization