Shortcuts

torchao.dtypes

Layouts and Tensor Subclasses

NF4Tensor

NF4Tensor class for converting a weight to the QLoRA NF4 format

AffineQuantizedTensor

Affine quantized tensor subclass.

Layout

The Layout class serves as a base class for defining different data layouts for tensors.

PlainLayout

PlainLayout is the most basic layout class, inheriting from the Layout base class.

SemiSparseLayout

SemiSparseLayout is a layout class for handling semi-structured sparse matrices in affine quantized tensors.

TensorCoreTiledLayout

TensorCoreTiledLayout is a layout class for handling tensor core tiled layouts in affine quantized tensors.

Float8Layout

Represents the layout configuration for Float8 affine quantized tensors.

MarlinSparseLayout

MarlinSparseLayout is a layout class for handling sparse tensor formats specifically designed for the Marlin sparse kernel.

Int4CPULayout

Layout class for int4 CPU layout for affine quantized tensor, used by tinygemm kernels _weight_int4pack_mm_for_cpu.

CutlassSemiSparseLayout

Layout class for float8 2:4 sparsity layout for affine quantized tensor, for cutlass kernel.

Quantization techniques

to_affine_quantized_intx

Convert a high precision tensor to an integer affine quantized tensor.

to_affine_quantized_intx_static

Create an integer AffineQuantizedTensor from a high precision tensor using static parameters.

to_affine_quantized_fpx

Create a floatx AffineQuantizedTensor from a high precision tensor.

to_affine_quantized_floatx

Convert a high precision tensor to a float8 quantized tensor.

to_affine_quantized_floatx_static

Create a float8 AffineQuantizedTensor from a high precision tensor using static parameters.

to_marlinqqq_quantized_intx

Converts a floating point tensor to a Marlin QQQ quantized tensor.

to_nf4

Convert a given tensor to normalized float 4-bit tensor.

Prototype

BlockSparseLayout

BlockSparseLayout is a data class that represents the layout of a block sparse matrix.

CutlassInt4PackedLayout

Layout class for int4 packed layout for affine quantized tensor, for cutlass kernel.

Int8DynamicActInt4WeightCPULayout

Layout class for da8w4 CPU layout for affine quantized tensor

MarlinQQQTensor

MarlinQQQ quantized tensor subclass which inherits AffineQuantizedTensor class.

MarlinQQQLayout

MarlinQQQLayout is a layout class for Marlin QQQ quantization.

FloatxTensorCoreLayout

FloatxTensorCoreLayout is a data class that defines the layout for a tensor with a specific number of exponent bits (ebits) and mantissa bits (mbits).

UintxLayout

A layout class for Uintx tensors, which are tensors with elements packed into smaller bit-widths than the standard 8-bit byte.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources