NVFP4WeightOnlyConfig#
- class torchao.prototype.mx_formats.NVFP4WeightOnlyConfig(use_dynamic_per_tensor_scale: bool = True)[source][source]#
NVIDIA FP4 (NVFP4) Weight-Only Quantization Configuration
This configuration applies NVFP4 quantization to weights only, keeping activations in their original precision.
Example:
import torch import torch.nn as nn from torchao.prototype.mx_formats.inference_workflow import NVFP4WeightOnlyConfig from torchao.quantization import quantize_ model = nn.Linear(32, 128, bias=False, dtype=torch.bfloat16, device="cuda") config = NVFP4WeightOnlyConfig( use_dynamic_per_tensor_scale=True, ) quantize_(model, config=config) model = torch.compile(model, fullgraph=True)