Float8DynamicActivationInt4WeightConfig#

class torchao.quantization.Float8DynamicActivationInt4WeightConfig(int4_packing_format: Int4PackingFormat = 'preshuffled')[source][source]#

Configuration for apply float8 dynamic per row quantization and int4 per group weight quantization to linear (only group_size 128 is supported right now since underlying kernel used only supports 128 and above and no benefits of making it bigger)

Parameters: int4_packing_format – how the weight is packed, only preshuffled is supported

Example:

import torch.nn as nn

from torchao.quantization import Float8DynamicActivationInt4WeightConfig, quantize_

model = nn.Sequential(nn.Linear(2048, 2048, device="cuda"))
quantize_(model, Float8DynamicActivationInt4WeightConfig())

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources