torchao.prototype.attention (prototype)#

Created On: May 09, 2026 | Last Updated On: May 09, 2026

High-Level API#

`apply_low_precision_attention`	Apply low-precision attention to a model.
`AttentionBackend`	Backend kernel for computing attention.

`fp8_fa3_sdpa`	FP8 SDPA shared by all backends.
`fp8_fa3_rope_sdpa`	Fused RoPE + FP8 SDPA shared by all backends.