Rate this Page

{BACKEND_NAME} Quantization#

Document quantization schemes and flows for the backend. This should include a description of each scheme and a code example to perform quantization. Example sections for PT2E and quantize_ are included below, to be replaced with details for the target backend.

For each supported quantization scheme, include the following:

  • What is the quantization scheme?

    • How are weights quantized?

    • How are activations quantized? Static or dynamic?

    • How many bits?

    • What is the granularity? Per-tensor, per-channel, group/block-wise?

  • What are the steps to quantize a model with this scheme?

  • Include a code sample.

  • If the quantization flow only supports a small set of operators - for example, linear only - note this.

Supported Quantization Schemes#

The {BACKEND_NAME} delegate supports the following quantization schemes:

  • {QUANTIZATION_SCHEME_1}

  • {QUANTIZATION_SCHEME_2}

{QUANTIZATION_METHOD_1} using the PT2E Flow#

[Description]

[Code Sample]

LLM Quantization with quantize_#

[Description]

[Code Sample]