quantize ¶

torchao.quantization.quantize_(model: Module, config: AOBaseConfig, filter_fn: Optional[Callable[[Module, str], bool]] = None, device: Optional[Union[device, str, int]] = None)[source]¶

Convert the weight of linear modules in the model with config, model is modified inplace

Parameters:

model (torch.nn.Module) – input model
config (AOBaseConfig) – a workflow configuration object.
filter_fn (Optional[Callable[[torch.nn.Module, str], bool]]) – function that takes a nn.Module instance and fully qualified name of the module, returns True if we want to run config on
module (the weight of the) –
device (device, optional) – Device to move module to before applying filter_fn. This can be set to “cuda” to speed up quantization. The final model will be on the specified device. Defaults to None (do not change device).

Example:

import torch
import torch.nn as nn
from torchao import quantize_

# quantize with some predefined `config` method that corresponds to
# optimized execution paths or kernels (e.g. int4 tinygemm kernel)
# also customizable with arguments
# currently options are
# int8_dynamic_activation_int4_weight (for executorch)
# int8_dynamic_activation_int8_weight (optimized with int8 mm op and torch.compile)
# int4_weight_only (optimized with int4 tinygemm kernel and torch.compile)
# int8_weight_only (optimized with int8 mm op and torch.compile
from torchao.quantization.quant_api import int4_weight_only

m = nn.Sequential(nn.Linear(32, 1024), nn.Linear(1024, 32))
quantize_(m, int4_weight_only(group_size=32))

quantize ¶

Docs

Tutorials

Resources

quantize¶

Docs

Tutorials

Resources

quantize ¶