NF4Tensor¶
- class torchao.dtypes.NF4Tensor(tensor_meta: SubclassTensorArgs, block_size: int, n_blocks: int, scaler_block_size: int, quantized_scalers: Tensor, quantization_factor: Tensor, scaler_mean: Tensor, quantized_data: Tensor, nf4: Tensor)[source]¶
NF4Tensor class for converting a weight to the QLoRA NF4 format
- static convert_to_norm_float_weight(input_tensor: Tensor, n_blocks: int, block_size: int, nf4: Tensor) Tensor [source]¶
Convert a tensor to the normalized float weight format
- static dequantize(value: Tensor, nf4: Tensor) Tensor [source]¶
Dequantize a nf4 value to bfloat16 format
- dequantize_scalers(input_tensor: Tensor, quantization_factor: Tensor, scaler_block_size: int) Tensor [source]¶
Used to unpack the double quantized scalers
- Parameters:
input_tensor – Input tensor to convert to QLoRA format this is the quantized scalers in int8 format
quantization_factor – Tensor of per_scaler_block quantization factors stored in inpt_weight.dtype
scaler_block_size – Scaler block size to use for double quantization.
- static double_quantize_scalers(input_tensor: Tensor, block_size: int, scaler_block_size: int) Tuple[Tensor, Tensor, Tensor] [source]¶
Used to achieve the double quantization of the scalers We take the input tensor first calculate the absmax quantization factors for each block. We then find the mean of our positive absmax scalers. We subtract this mean from the scalers And then we calculate the absmax quantization factors for each block again. We then quantize the scalers to int8.
- Parameters:
input_tensor – Input tensor to convert to QLoRA format, typically a weight tensor
- Returns:
- Tensor of per_block quantization factors stored in int8 format
size: (n_blocks)
- torch.Tensor: Tensor of per_scaler_block quantization factors stored in int16 format
size: (n_scaler_blocks)
- Return type: