{BACKEND_NAME} Quantization#

Document quantization schemes and flows for the backend. This should include a description of each scheme and a code example to perform quantization. Example sections for PT2E and quantize_ are included below, to be replaced with details for the target backend.

For each supported quantization scheme, include the following:

What is the quantization scheme?
- How are weights quantized?
- How are activations quantized? Static or dynamic?
- How many bits?
- What is the granularity? Per-tensor, per-channel, group/block-wise?
What are the steps to quantize a model with this scheme?
Include a code sample.
If the quantization flow only supports a small set of operators - for example, linear only - note this.

Supported Quantization Schemes#

The {BACKEND_NAME} delegate supports the following quantization schemes:

{QUANTIZATION_SCHEME_1}
{QUANTIZATION_SCHEME_2}

{QUANTIZATION_METHOD_1} using the PT2E Flow#

[Description]

[Code Sample]

LLM Quantization with quantize_#

[Description]

[Code Sample]

{BACKEND_NAME} Quantization#

Supported Quantization Schemes#

{QUANTIZATION_METHOD_1} using the PT2E Flow#

LLM Quantization with quantize_#

Docs

Tutorials

Resources