Rate this Page

NF4Tensor#

class torchao.dtypes.NF4Tensor(tensor_meta: SubclassTensorArgs, block_size: int, n_blocks: int, scaler_block_size: int, quantized_scalers: Tensor, quantization_factor: Tensor, scaler_mean: Tensor, quantized_data: Tensor, nf4: Tensor)[source][source]#

NF4Tensor class for converting a weight to the QLoRA NF4 format

static convert_to_norm_float_weight(input_tensor: Tensor, n_blocks: int, block_size: int, nf4: Tensor) Tensor[source][source]#

Convert a tensor to the normalized float weight format

static dequantize(value: Tensor, nf4: Tensor) Tensor[source][source]#

Dequantize a nf4 value to bfloat16 format

dequantize_scalers(input_tensor: Tensor, quantization_factor: Tensor, scaler_block_size: int) Tensor[source][source]#

Used to unpack the double quantized scalers

Parameters
  • input_tensor – Input tensor to convert to QLoRA format this is the quantized scalers in int8 format

  • quantization_factor – Tensor of per_scaler_block quantization factors stored in inpt_weight.dtype

  • scaler_block_size – Scaler block size to use for double quantization.

static double_quantize_scalers(input_tensor: Tensor, block_size: int, scaler_block_size: int) Tuple[Tensor, Tensor, Tensor][source][source]#

Used to achieve the double quantization of the scalers We take the input tensor first calculate the absmax quantization factors for each block. We then find the mean of our positive absmax scalers. We subtract this mean from the scalers And then we calculate the absmax quantization factors for each block again. We then quantize the scalers to int8.

Parameters

input_tensor – Input tensor to convert to QLoRA format, typically a weight tensor

Returns

Tensor of per_block quantization factors stored in int8 format

size: (n_blocks)

torch.Tensor: Tensor of per_scaler_block quantization factors stored in int16 format

size: (n_scaler_blocks)

Return type

torch.Tensor

get_original_weight() Tensor[source][source]#

Get the original weight from the normalized float weight format

static quantize_tensor_nearest(value: Tensor, nf4: Tensor) Tensor[source][source]#

Quantize a float16 tensor to nf4 format to nearest and not rounded up