Shortcuts

smooth_fq_linear_to_inference

torchao.quantization.smooth_fq_linear_to_inference(model, debug_skip_calibration=False) None[source]

Prepares the model for inference by calculating the smoothquant scale for each SmoothFakeDynamicallyQuantizedLinear layer.

Parameters:
  • model (torch.nn.Module) – The model containing SmoothFakeDynamicallyQuantizedLinear layers.

  • debug_skip_calibration (bool, optional) – If True, sets the running maximum of activations to a debug value for performance benchmarking. Defaults to False.

Returns:

None

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources