Shortcuts

FPXWeightOnlyConfig

class torchao.quantization.FPXWeightOnlyConfig(ebits: int, mbits: int, set_inductor_config: bool = True)[source]

Sub-byte floating point dtypes defined by ebits: exponent bits and mbits: mantissa bits e.g. fp6_e3_m2, fp6_e2_m3, … The packing format and kernels are from the fp6-llm paper: https://arxiv.org/abs/2401.14112 github repo: https://github.com/usyd-fsalab/fp6_llm, now renamed to quant-llm For more details for packing please see: FpxTensorCoreAQTTensorImpl

This is experimental, will be merged with to_affine_quantized_floatx in the future

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources