Precision & Quantization#
Control numerical precision with FP16, BF16, and mixed-precision autocast, and reduce model size with INT8/FP8/FP4 quantization via ModelOpt.
Control numerical precision with FP16, BF16, and mixed-precision autocast, and reduce model size with INT8/FP8/FP4 quantization via ModelOpt.