Rate this Page

Float8WeightOnlyConfig#

class torchao.quantization.Float8WeightOnlyConfig(weight_dtype: dtype = torch.float8_e4m3fn, set_inductor_config: bool = True, version: int = 2)[source][source]#

Configuration for applying float8 weight-only symmetric per-channel quantization to linear layers.

Parameters
  • weight_dtype (torch.dtype) – The target data type for weight quantization. Default is torch.float8_e4m3fn.

  • set_inductor_config (bool) – if True, adjusts torchinductor settings to recommended values.

  • version (int) – the version of the config, version 1 is deprecated, version 2 is using Float8Tensor (default)

Note

The actual matmul will be computed in original precision of the weight tensor.

Example:

# for torch 2.5+
from torchao.quantization import quantize_, Float8WeightOnlyConfig
quantize_(model, Float8WeightOnlyConfig())