Int8DynamicActivationInt8WeightConfig#

class torchao.quantization.Int8DynamicActivationInt8WeightConfig(layout: Optional[Layout] = PlainLayout(), act_mapping_type: Optional[MappingType] = MappingType.SYMMETRIC, weight_only_decode: bool = False, granularity: Granularity = PerRow(dim=-1), set_inductor_config: bool = True, version: int = 1)[source][source]#

Configuration for applying int8 dynamic symmetric per-token activation and int8 per-channel weight quantization to linear layers.

Parameters

layout – Optional[Layout] = PlainLayout() - Tensor layout for the quantized weights. Controls how the quantized data is stored and accessed.
act_mapping_type – Optional[MappingType] = MappingType.SYMMETRIC - Mapping type for activation quantization. SYMMETRIC uses symmetric quantization around zero.
weight_only_decode – bool = False - If True, only quantizes weights during forward pass and keeps activations in original precision during decode operations.
set_inductor_config – bool = True - If True, adjusts torchinductor settings to recommended values for better performance with this quantization scheme.
version (int) – the version of the config, version 1 is using AffineQuantizedTensor that we plan to deprecate/split, version 2 is using Int8Tensor

Int8DynamicActivationInt8WeightConfig#

Docs

Tutorials

Resources