Shortcuts

float8_dynamic_activation_float8_weight

torchao.quantization.float8_dynamic_activation_float8_weight(activation_dtype: dtype = torch.float8_e4m3fn, weight_dtype: dtype = torch.float8_e4m3fn, granularity: Optional[Union[PerTensor, PerRow, Tuple[Union[PerTensor, PerRow], Union[PerTensor, PerRow]]]] = None, mm_config: Optional[Float8MMConfig] = None)[source]

Applies float8 dynamic symmetric quantization to both activations and weights of linear layers.

Parameters:
  • activation_dtype (torch.dtype) – The target data type for activation quantization. Default is torch.float8_e4m3fn.

  • weight_dtype (torch.dtype) – The target data type for weight quantization. Default is torch.float8_e4m3fn.

  • granularity – The granularity for quantization. Can be either a single granularity (applied to both activations and weights) or a tuple of two granularities (one for activations, one for weights). If None, defaults to PerTensor for both. Currently both quantizations need to be the same type. And only PerTensor and PerRow are supported.

  • mm_config (Float8MMConfig) – Configuration for the matrix multiplication. Default uses fast accumulation.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources