MarlinSparseLayout¶
- class torchao.dtypes.MarlinSparseLayout[source]¶
MarlinSparseLayout is a layout class for handling sparse tensor formats specifically designed for the Marlin sparse kernel. This layout is used to optimize the storage and computation of affine quantized tensors with 2:4 sparsity patterns.
The layout ensures that the tensor data is pre-processed and stored in a format that is compatible with the Marlin sparse kernel operations. It provides methods for preprocessing input tensors and managing the layout of quantized tensors.
- pre_process(input: Tensor) Tensor [source]¶
- Preprocess the input tensor to be in the correct format for the Marlin sparse kernel.
1º: the input tensor is transposed since the linear layer keeps the weights in a transposed format
2º: tensor is injected with 2:4 sparsity
3º: transposes it again because the quantization process will compute the scales for dim=-1
- Parameters:
input (torch.Tensor) – the input tensor to preprocess
- Returns:
the preprocessed tensor
- Return type: