fake_quantize_affine_cachemask¶
- torchao.quantization.fake_quantize_affine_cachemask(input: Tensor, block_size: Tuple[int, ...], scale: Tensor, zero_point: Optional[Tensor], quant_dtype: dtype, quant_min: Optional[Union[int, float]] = None, quant_max: Optional[Union[int, float]] = None, zero_point_domain: ZeroPointDomain = ZeroPointDomain.INT) Tuple[Tensor, Tensor] [source]¶
General fake quantize op for quantization-aware training (QAT). This is equivalent to calling quantize_affine + dequantize_affine but without the dtype casts.
Note: Compared to
fake_quantize_affine()
, this consumes more memory and returns an additional outlier mask for intermediate quantized values.:param Same as
fake_quantize_affine()
.:- Returns:
- A 2-tuple of (
final fake quantized values, outlier mask for intermediate quantized values
)