Float8DynamicActivationInt4WeightConfig#
- class torchao.quantization.Float8DynamicActivationInt4WeightConfig(int4_packing_format: Int4PackingFormat = 'preshuffled')[source][source]#
Configuration for apply float8 dynamic per row quantization and int4 per group weight quantization to linear (only group_size 128 is supported right now since underlying kernel used only supports 128 and above and no benefits of making it bigger)
- Parameters
int4_packing_format – how the weight is packed, only preshuffled is supported
Example:
import torch.nn as nn from torchao.quantization import Float8DynamicActivationInt4WeightConfig, quantize_ model = nn.Sequential(nn.Linear(2048, 2048, device="cuda")) quantize_(model, Float8DynamicActivationInt4WeightConfig())