Skip to main content
Ctrl+K
torchao - Home
torchao - Home
  • Workflows
    • Quantized Training
    • Quantization-Aware Training (QAT)
    • Quantized Inference
  • API Reference
    • torchao.quantization
    • torchao.quantization.qat
    • torchao.sparsity
    • torchao.float8
    • torchao.core
    • torchao.prototype.quant_logger
    • torchao.prototype.attention (prototype)
  • Tutorials
    • First Quantization Example
    • (Part 1) Pre-training with float8
    • (Part 2) Fine-tuning with QAT, QLoRA, and float8
    • (Part 3) Serving on vLLM, SGLang, ExecuTorch
    • Integration with VLLM: Architecture and Usage Guide
    • Hugging Face Integration
    • Serialization
    • Static Quantization
    • Writing Your Own Quantized Tensor
    • Writing Your Own Quantized Tensor (advanced)
    • MXFP8 Expert Parallel Training
    • Debugging Weights and Activations with quant_logger
  • Contributing
    • Quantization Overview
    • Contributor Guide
    • Sparsity Overview
    • Benchmarking API Guide
  • PT2E Quantization
    • PyTorch 2 Export Post Training Quantization
    • PyTorch 2 Export Quantization-Aware Training (QAT)
    • PyTorch 2 Export Quantization with X86 Backend through Inductor
    • PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
    • PyTorch 2 Export Quantization for OpenVINO torch.compile Backend
    • How to Write a Quantizer for PyTorch 2 Export Quantization
Go to pytorch.org
Ctrl+K
  • X
  • GitHub
  • Discourse
  • PyPi
  • Workflows
    • Quantized Training
    • Quantization-Aware Training (QAT)
    • Quantized Inference
  • API Reference
    • torchao.quantization
    • torchao.quantization.qat
    • torchao.sparsity
    • torchao.float8
    • torchao.core
    • torchao.prototype.quant_logger
    • torchao.prototype.attention (prototype)
  • Tutorials
    • First Quantization Example
    • (Part 1) Pre-training with float8
    • (Part 2) Fine-tuning with QAT, QLoRA, and float8
    • (Part 3) Serving on vLLM, SGLang, ExecuTorch
    • Integration with VLLM: Architecture and Usage Guide
    • Hugging Face Integration
    • Serialization
    • Static Quantization
    • Writing Your Own Quantized Tensor
    • Writing Your Own Quantized Tensor (advanced)
    • MXFP8 Expert Parallel Training
    • Debugging Weights and Activations with quant_logger
  • Contributing
    • Quantization Overview
    • Contributor Guide
    • Sparsity Overview
    • Benchmarking API Guide
  • PT2E Quantization
    • PyTorch 2 Export Post Training Quantization
    • PyTorch 2 Export Quantization-Aware Training (QAT)
    • PyTorch 2 Export Quantization with X86 Backend through Inductor
    • PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
    • PyTorch 2 Export Quantization for OpenVINO torch.compile Backend
    • How to Write a Quantizer for PyTorch 2 Export Quantization
Go to pytorch.org
Ctrl+K
  • X
  • GitHub
  • Discourse
  • PyPi

Section Navigation

  • First Quantization Example
  • (Part 1) Pre-training with float8
  • (Part 2) Fine-tuning with QAT, QLoRA, and float8
  • (Part 3) Serving on vLLM, SGLang, ExecuTorch
  • Integration with VLLM: Architecture and Usage Guide
  • Hugging Face Integration
  • Serialization
  • Static Quantization
  • Writing Your Own Quantized Tensor
  • Writing Your Own Quantized Tensor (advanced)
  • MXFP8 Expert Parallel Training
  • Debugging Weights and Activations with quant_logger
  • Tutorials
Rate this Page
★ ★ ★ ★ ★

Tutorials#

Created On: Apr 03, 2026 | Last Updated On: Apr 03, 2026

Tutorials for quantization using eager mode execution.

  • First Quantization Example
  • (Part 1) Pre-training with float8
  • (Part 2) Fine-tuning with QAT, QLoRA, and float8
  • (Part 3) Serving on vLLM, SGLang, ExecuTorch
  • Integration with VLLM: Architecture and Usage Guide
  • Hugging Face Integration
  • Serialization
  • Static Quantization
  • Writing Your Own Quantized Tensor
  • Writing Your Own Quantized Tensor (advanced)
  • MXFP8 Expert Parallel Training
  • Debugging Weights and Activations with quant_logger
Rate this Page
★ ★ ★ ★ ★

previous

fp8_fa3_rope_sdpa

next

First Quantization Example

Built with the PyData Sphinx Theme 0.15.4.

previous

fp8_fa3_rope_sdpa

next

First Quantization Example

Edit on GitHub
Show Source

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources

Stay in touch for updates, event info, and the latest news

By submitting this form, I consent to receive marketing emails from the LF and its projects regarding their events, training, research, developments, and related announcements. I understand that I can unsubscribe at any time using the links in the footers of the emails I receive. Privacy Policy.

© PyTorch. Copyright © The Linux Foundation®. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For more information, including terms of use, privacy policy, and trademark usage, please see our Policies page. Trademark Usage. Privacy Policy.

To analyze traffic and optimize your experience, we serve cookies on this site. By clicking or navigating, you agree to allow our usage of cookies. As the current maintainers of this site, Facebook’s Cookies Policy applies. Learn more, including about available controls: Cookies Policy.

© Copyright 2024-present, torchao Contributors.

Created using Sphinx 7.2.6.

Built with the PyData Sphinx Theme 0.15.4.