Rate this Page

Int8DynamicActivationInt8WeightConfig#

class torchao.quantization.Int8DynamicActivationInt8WeightConfig(act_mapping_type: MappingType | None = MappingType.SYMMETRIC, weight_only_decode: bool = False, granularity: Granularity | Tuple[Granularity, Granularity] | list[Granularity] | None = PerRow(dim=-1), set_inductor_config: bool = True, version: int = 2)[source][source]#

Configuration for applying int8 dynamic per-token activation and int8 per-channel weight quantization to linear layers.

Parameters:
  • granularity – Optional[Union[Granularity, Tuple[Granularity, Granularity], List[Granularity]]] = PerRow() The granularity for quantization. Can be either a single granularity (applied to both activations and weights) or a tuple / list of two granularities (first for activations, second for weights). If None, defaults to PerRow for both. Only PerTensor and PerRow are supported.

  • act_mapping_type – Optional[MappingType] = MappingType.SYMMETRIC - Mapping type for activation quantization. SYMMETRIC and ASYMMETRIC are supported.

  • set_inductor_config – bool = True - If True, adjusts torchinductor settings to recommended values for better performance with this quantization scheme.

Example:

import torch.nn as nn

from torchao.quantization import Int8DynamicActivationInt8WeightConfig, quantize_

model = nn.Sequential(nn.Linear(2048, 2048, device="cuda"))
quantize_(model, Int8DynamicActivationInt8WeightConfig())