PerChannelMinMaxObserver¶
-
class
torch.quantization.observer.PerChannelMinMaxObserver(ch_axis=0, dtype=torch.quint8, qscheme=torch.per_channel_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, memoryless=False)[source]¶ Observer module for computing the quantization parameters based on the running per channel min and max values.
This observer uses the tensor min/max statistics to compute the per channel quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.
- Parameters
ch_axis – Channel axis
dtype – Quantized data type
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
quant_min – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
quant_max – Maximum quantization value. If unspecified, it will follow the 8-bit setup.
memoryless – Boolean that controls whether observer removes old data when a new input is seen. This is most useful for simulating dynamic quantization, especially during QAT.
The quantization parameters are computed the same way as in
MinMaxObserver, with the difference that the running min/max values are stored per channel. Scales and zero points are thus computed per channel as well.Note
If the running minimum equals to the running maximum, the scales and zero_points are set to 1.0 and 0.