Shortcuts

NF4Tensor

class torchao.dtypes.NF4Tensor(tensor_meta: SubclassTensorArgs, block_size: int, n_blocks: int, scaler_block_size: int, quantized_scalers: Tensor, quantization_factor: Tensor, scaler_mean: Tensor, quantized_data: Tensor, nf4: Tensor)[source]

NF4Tensor class for converting a weight to the QLoRA NF4 format

static convert_to_norm_float_weight(input_tensor: Tensor, n_blocks: int, block_size: int, nf4: Tensor) Tensor[source]

Convert a tensor to the normalized float weight format

static dequantize(value: Tensor, nf4: Tensor) Tensor[source]

Dequantize a nf4 value to bfloat16 format

dequantize_scalers(input_tensor: Tensor, quantization_factor: Tensor, scaler_block_size: int) Tensor[source]

Used to unpack the double quantized scalers

Parameters:
  • input_tensor – Input tensor to convert to QLoRA format this is the quantized scalers in int8 format

  • quantization_factor – Tensor of per_scaler_block quantization factors stored in inpt_weight.dtype

  • scaler_block_size – Scaler block size to use for double quantization.

static double_quantize_scalers(input_tensor: Tensor, block_size: int, scaler_block_size: int) Tuple[Tensor, Tensor, Tensor][source]

Used to achieve the double quantization of the scalers We take the input tensor first calculate the absmax quantization factors for each block. We then find the mean of our positive absmax scalers. We subtract this mean from the scalers And then we calculate the absmax quantization factors for each block again. We then quantize the scalers to int8.

Parameters:

input_tensor – Input tensor to convert to QLoRA format, typically a weight tensor

Returns:

Tensor of per_block quantization factors stored in int8 format

size: (n_blocks)

torch.Tensor: Tensor of per_scaler_block quantization factors stored in int16 format

size: (n_scaler_blocks)

Return type:

torch.Tensor

get_original_weight() Tensor[source]

Get the original weight from the normalized float weight format

static quantize_tensor_nearest(value: Tensor, nf4: Tensor) Tensor[source]

Quantize a float16 tensor to nf4 format to nearest and not rounded up

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources