Rate this Page

NVFP4WeightOnlyConfig#

class torchao.prototype.mx_formats.NVFP4WeightOnlyConfig(use_dynamic_per_tensor_scale: bool = True)[source][source]#

NVIDIA FP4 (NVFP4) Weight-Only Quantization Configuration

This configuration applies NVFP4 quantization to weights only, keeping activations in their original precision.

Example:

import torch
import torch.nn as nn

from torchao.prototype.mx_formats.inference_workflow import NVFP4WeightOnlyConfig
from torchao.quantization import quantize_

model = nn.Linear(32, 128, bias=False, dtype=torch.bfloat16, device="cuda")
config = NVFP4WeightOnlyConfig(
    use_dynamic_per_tensor_scale=True,
)
quantize_(model, config=config)
model = torch.compile(model, fullgraph=True)