Int8DynamicActivationInt8WeightConfig#
- class torchao.quantization.Int8DynamicActivationInt8WeightConfig(act_mapping_type: MappingType | None = MappingType.SYMMETRIC, weight_only_decode: bool = False, granularity: Granularity | Tuple[Granularity, Granularity] | list[Granularity] | None = PerRow(dim=-1), set_inductor_config: bool = True, version: int = 2)[source][source]#
Configuration for applying int8 dynamic per-token activation and int8 per-channel weight quantization to linear layers.
- Parameters:
granularity – Optional[Union[Granularity, Tuple[Granularity, Granularity], List[Granularity]]] = PerRow() The granularity for quantization. Can be either a single granularity (applied to both activations and weights) or a tuple / list of two granularities (first for activations, second for weights). If None, defaults to PerRow for both. Only PerTensor and PerRow are supported.
act_mapping_type – Optional[MappingType] = MappingType.SYMMETRIC - Mapping type for activation quantization. SYMMETRIC and ASYMMETRIC are supported.
set_inductor_config – bool = True - If True, adjusts torchinductor settings to recommended values for better performance with this quantization scheme.
Example:
import torch.nn as nn from torchao.quantization import Int8DynamicActivationInt8WeightConfig, quantize_ model = nn.Sequential(nn.Linear(2048, 2048, device="cuda")) quantize_(model, Int8DynamicActivationInt8WeightConfig())