quantize_affine_floatx¶
- torchao.quantization.quantize_affine_floatx(tensor: Tensor, scale: Tensor, ebits: int, mbits: int) Tensor [source]¶
Quantizes the float32 high precision floating point tensor to low precision floating point number and converts the result to unpacked floating point format with the format of 00SEEEMM (for fp6_e3m2) where S means sign bit, e means exponent bit and m means mantissa bit