fake_quantize_affine_cachemask¶
- torchao.quantization.fake_quantize_affine_cachemask(input: Tensor, block_size: Tuple[int, ...], scale: Tensor, zero_point: Optional[Tensor], quant_dtype: dtype, quant_min: Optional[Union[int, float]] = None, quant_max: Optional[Union[int, float]] = None, zero_point_domain: ZeroPointDomain = ZeroPointDomain.INT) Tuple[Tensor, Tensor][source]¶
General fake quantize op for quantization-aware training (QAT). This is equivalent to calling quantize_affine + dequantize_affine but without the dtype casts.
Note: Compared to
fake_quantize_affine(), this consumes more memory and returns an additional outlier mask for intermediate quantized values.:param Same as
fake_quantize_affine().:- Returns:
- A 2-tuple of (
final fake quantized values, outlier mask for intermediate quantized values
)