Operator Support#
This page lists the operators currently supported by the Vulkan backend. The source of truth for this information is op_registry.py, which is used by the Vulkan Partitioner to determine which operators should be lowered to the Vulkan backend and additionally describes the capabilities of each operator implementation.
If an operator used in your model is not in this list, feel free to create a feature request on Github and we will do our best to add an implementation for the operator.
The namespace of an operator describes where it originates from:
aten - operators in this namespace correspond 1:1 to operators in PyTorch’s ATen library. They all support fp16 and fp32 dtypes at a minimum.
dim_order_op - these operators are inserted when lowering to ExecuTorch in order to manage optimal tensor memory layouts. They are typically removed, since the Vulkan backend manages optimal tensor representations internally.
llama - custom ops targeted for LLM inference. These are typically inserted by model source transformations applied to a nn.Module and are not invoked directly by a PyTorch model.
operator - these operators work with symbolic integers, which are also supported by the Vulkan backend.
quantized_decomposed / torchao - these ops are introduced by quantization workflows (either torchao’s quantize_ API or the PT2E quantization flow). They typically represent quantizing/dequantizing a tensor, or choosing the quantization parameters for a tensor. In practice, most instances of these operators will be fused into a custom op in the et_vk namespace.
et_vk - these are custom operators implemented only in the Vulkan backend. They typically represent quantized variants of aten operators, or fusions of common operator patterns. They are inserted by operator fusion graph passes when lowering to the Vulkan backend.
All operators support dynamic input shapes unless otherwise noted (i.e. “no resize support”). The expectation is that over time, all operators will be able to support dynamic shapes.
Namespace |
Operator |
Notes |
---|---|---|
aten |
_log_softmax |
|
aten |
_native_batch_norm_legit_no_training |
|
aten |
_softmax |
|
aten |
_to_copy |
dtype conversion between float types only |
aten |
_weight_int8pack_mm |
|
aten |
abs |
|
aten |
add |
|
aten |
addmm |
|
aten |
amax |
keepdim=True required; max 2D reductions |
aten |
amin |
keepdim=True required; max 2D reductions |
aten |
arange |
|
aten |
avg_pool2d |
|
aten |
bmm |
|
aten |
cat |
|
aten |
clamp |
|
aten |
clone |
|
aten |
constant_pad_nd |
|
aten |
convolution |
batch=1 for 2D conv; no transposed 1D conv; no 3D conv |
aten |
cos |
|
aten |
div |
|
aten |
div.Tensor_mode |
|
aten |
embedding |
|
aten |
eq |
|
aten |
exp |
|
aten |
expand_copy |
no resize support |
aten |
flip |
|
aten |
full |
|
aten |
full_like |
|
aten |
ge |
|
aten |
gelu |
|
aten |
gt |
|
aten |
hardshrink |
|
aten |
hardtanh |
|
aten |
index_select |
|
aten |
le |
|
aten |
leaky_relu |
|
aten |
linear |
|
aten |
lt |
|
aten |
max_pool2d |
|
aten |
max_pool2d_with_indices |
|
aten |
mean |
keepdim=True required; max 2D reductions |
aten |
minimum |
|
aten |
mm |
|
aten |
native_group_norm |
|
aten |
native_layer_norm |
resize supported |
aten |
neg |
|
aten |
ones |
|
aten |
ones_like |
|
aten |
permute |
|
aten |
permute_copy |
|
aten |
pow |
|
aten |
relu |
|
aten |
repeat |
|
aten |
round |
|
aten |
rsqrt |
|
aten |
scalar_tensor |
|
aten |
select_copy |
|
aten |
sigmoid |
|
aten |
sin |
|
aten |
slice_copy |
|
aten |
split |
|
aten |
split_with_sizes_copy |
|
aten |
sqrt |
|
aten |
squeeze_copy |
|
aten |
sub |
|
aten |
sum |
keepdim=True required; max 2D reductions |
aten |
t_copy |
|
aten |
tanh |
|
aten |
unsqueeze_copy |
|
aten |
upsample_bilinear2d |
|
aten |
upsample_nearest2d |
|
aten |
view_copy |
|
aten |
zeros |
|
aten |
zeros_like |
|
aten |
_assert_scalar |
removed via graph pass |
aten |
sym_constrain_range_for_size |
removed via graph pass |
aten |
sym_size |
|
dim_order_ops |
_clone_dim_order |
no dtype conversion; removable if no dtype change |
dim_order_ops |
_to_dim_order_copy |
no dtype conversion; removable if no dtype change |
llama |
custom_sdpa |
|
llama |
sdpa_with_kv_cache |
|
llama |
update_cache |
|
operator |
add |
|
operator |
eq |
|
operator |
ge |
|
operator |
getitem |
|
operator |
gt |
|
operator |
le |
|
operator |
lt |
|
quantized_decomposed |
choose_qparams |
|
quantized_decomposed |
choose_qparams_per_token_asymmetric |
|
quantized_decomposed |
dequantize_per_channel |
|
quantized_decomposed |
dequantize_per_tensor |
|
quantized_decomposed |
dequantize_per_token |
|
quantized_decomposed |
quantize_per_channel |
|
quantized_decomposed |
quantize_per_tensor |
|
quantized_decomposed |
quantize_per_token |
|
torchao |
choose_qparams_affine |
|
torchao |
dequantize_affine |
|
torchao |
quantize_affine |
|
et_vk |
add_q8ta_q8ta_q8to |
no resize support |
et_vk |
apply_rotary_emb |
|
et_vk |
conv2d_q8ta_q8csw_q8to |
no resize support |
et_vk |
conv2d_q8ta_q8csw_q8to_dw |
no resize support |
et_vk |
conv_with_clamp |
batch=1 for 2D conv; no transposed 1D conv |
et_vk |
dequantize_q8to_from_conv2d |
no resize support |
et_vk |
grid_priors |
|
et_vk |
linear_dq8ca_q4gsw |
|
et_vk |
linear_q4gsw |
|
et_vk |
linear_q8ta_q8csw |
|
et_vk |
linear_qcs4w |
|
et_vk |
quantize_q8ta_for_conv2d |
no resize support |