Arm VGF Backend#
The Arm® VGF backend is the ExecuTorch solution for lowering PyTorch models to VGF compatible hardware. It leverages the TOSA operator set and the ML SDK for Vulkan® to produce a .PTE file. The VGF backend also supports execution from a .PTE file and provides functionality to extract the corresponding VGF file for integration into various applications.
Features#
Wide operator support for delegating large parts of models to the VGF target.
A quantizer that optimizes quantization for the VGF target.
Target Requirements#
The target system must include ML SDK for Vulkan and a Vulkan driver with Vulkan API >= 1.3.
Development Requirements#
Tip
All requirements can be downloaded using examples/arm/setup.sh --enable-mlsdk-deps --disable-ethos-u-deps and added to the path using
source examples/arm/arm-scratch/setup_path.sh
For the AOT flow, compilation of a model to .pte format using the VGF backend, the requirements are:
TOSA Serialization Library for serializing the Exir IR graph into TOSA IR.
ML SDK Model Converter for converting TOSA flatbuffers to VGF files.
And for building and running your application using the generic executor_runner:
Vulkan API should be set up locally for GPU execution support.
ML Emulation Layer for Vulkan for testing on Vulkan API.
Using the Arm VGF Backend#
The VGF Minimal Example demonstrates how to lower a module using the VGF backend.
The main configuration point for the lowering is the VgfCompileSpec consumed by the partitioner and quantizer.
The full user-facing API is documented below.
class VgfCompileSpec(tosa_spec: executorch.backends.arm.tosa.specification.TosaSpecification | str | None = None, compiler_flags: list[str] | None = None)
Normalise inputs and populate the underlying Arm compile spec.
Args:
tosa_spec (TosaSpecification | str | None): TOSA specification to target. Strings are parsed via
TosaSpecification.create_from_string. Defaults to"TOSA-1.0+FP+INT".compiler_flags (list[str] | None): Optional converter-backend flags.
def VgfCompileSpec.dump_debug_info(self, debug_mode: executorch.backends.arm.common.arm_compile_spec.ArmCompileSpec.DebugMode | None):
Dump debugging information into the intermediates path.
Args:
debug_mode: The debug mode to use for dumping debug information.
def VgfCompileSpec.dump_intermediate_artifacts_to(self, output_path: str | None):
Sets a path for dumping intermediate results during such as tosa and pte.
Args:
output_path: Path to dump intermediate results to.
def VgfCompileSpec.get_intermediate_path(self) -> str | None:
Gets the path used for dumping intermediate results such as tosa and pte.
Returns: Path where intermediate results are saved.
def VgfCompileSpec.get_output_format() -> str:
Return the artifact format emitted by this compile spec.
def VgfCompileSpec.get_output_order_workaround(self) -> bool:
Gets whether the output order workaround is being applied.
def VgfCompileSpec.get_pass_pipeline_config(self) -> executorch.backends.arm.common.pipeline_config.ArmPassPipelineConfig:
Returns configuration that controls how the Arm pass pipeline should behave. Subclasses may override to tweak defaults for specific targets.
def VgfCompileSpec.set_output_order_workaround(self, output_order_workaround: bool):
Sets whether to apply the output order workaround.
Args:
output_order_workaround: Boolean indicating whether to apply the workaround.
def VgfCompileSpec.set_pass_pipeline_config(self, config: executorch.backends.arm.common.pipeline_config.ArmPassPipelineConfig) -> None:
Sets the configuration that controls how the Arm pass pipeline should behave. Subclasses may override to tweak defaults for specific targets.
Args:
config: The custom ArmPassPipelineConfig to set.
Partitioner API#
See Partitioner API for more information of the Partitioner API.
Quantization#
The VGF quantizer supports Post Training Quantization (PT2E) and Quantization-Aware Training (QAT).
Partial quantization is supported, allowing users to quantize only specific parts of the model while leaving others in floating-point.
For more information on quantization, see Quantization.
Runtime Integration#
The VGF backend can use the default ExecuTorch runner. The steps required for building and running it are explained in the VGF Backend Tutorial. The example application is recommended to use for testing basic functionality of your lowered models, as well as a starting point for developing runtime integrations for your own targets.
Reference#
→Partitioner API — Partitioner options.
→Quantization — Supported quantization schemes.
→Arm VGF Troubleshooting — Debug common issues.
→Arm VGF Backend Tutorials — Tutorials.