int8_dynamic_activation_int4_weight¶
- torchao.quantization.int8_dynamic_activation_int4_weight(group_size=32, layout=PlainLayout(), mapping_type=MappingType.SYMMETRIC, act_mapping_type=MappingType.ASYMMETRIC)[source]¶
Applies int8 dynamic per token asymmetric activation quantization and int4 per group weight symmetric quantization to linear This is used to produce a model for executorch backend, but currently executorch did not support lowering for the quantized model from this flow yet
- Parameters:
group_size – parameter for quantization, controls the granularity of quantization, smaller size is more fine grained
layout – layout type for quantized weight tensor, only supports MarlinQQQLayout() and CutlassInt4PackedLayout() for now
mapping_type – quantization type for weight, controls the weight quantization is symmetric or asymmetric
act_mapping_type – quantization type for activation, controls the activation quantization is symmetric or asymmetric