smooth_fq_linear_to_inference¶
- torchao.quantization.smooth_fq_linear_to_inference(model, debug_skip_calibration=False) None [source]¶
Prepares the model for inference by calculating the smoothquant scale for each SmoothFakeDynamicallyQuantizedLinear layer.
- Parameters:
model (torch.nn.Module) – The model containing SmoothFakeDynamicallyQuantizedLinear layers.
debug_skip_calibration (bool, optional) – If True, sets the running maximum of activations to a debug value for performance benchmarking. Defaults to False.
- Returns:
None