MarlinSparseLayout#

class torchao.dtypes.MarlinSparseLayout[source][source]#

MarlinSparseLayout is a layout class for handling sparse tensor formats specifically designed for the Marlin sparse kernel. This layout is used to optimize the storage and computation of affine quantized tensors with 2:4 sparsity patterns.

The layout ensures that the tensor data is pre-processed and stored in a format that is compatible with the Marlin sparse kernel operations. It provides methods for preprocessing input tensors and managing the layout of quantized tensors.

pre_process(input: Tensor) → Tensor[source][source]#

Preprocess the input tensor to be in the correct format for the Marlin sparse kernel.

1º: the input tensor is transposed since the linear layer keeps the weights in a transposed format
2º: tensor is injected with 2:4 sparsity
3º: transposes it again because the quantization process will compute the scales for dim=-1

Parameters: input (torch.Tensor) – the input tensor to preprocess
Returns: the preprocessed tensor
Return type: torch.Tensor

MarlinSparseLayout#

Docs

Tutorials

Resources