Table of Contents

Shortcuts

smooth_fq_linear_to_inference¶

torchao.quantization.smooth_fq_linear_to_inference(model, debug_skip_calibration=False) → None[source]¶

Prepares the model for inference by calculating the smoothquant scale for each SmoothFakeDynamicallyQuantizedLinear layer.

Parameters:

model (torch.nn.Module) – The model containing SmoothFakeDynamicallyQuantizedLinear layers.
debug_skip_calibration (bool, optional) – If True, sets the running maximum of activations to a debug value for performance benchmarking. Defaults to False.

Returns:

None

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources