Shortcuts

quantize_affine_floatx

torchao.quantization.quantize_affine_floatx(tensor: Tensor, scale: Tensor, ebits: int, mbits: int) Tensor[source]

Quantizes the float32 high precision floating point tensor to low precision floating point number and converts the result to unpacked floating point format with the format of 00SEEEMM (for fp6_e3m2) where S means sign bit, e means exponent bit and m means mantissa bit

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources