torchao.prototype.attention (prototype)#
Created On: Mar 24, 2026 | Last Updated On: Mar 24, 2026
High-Level API#
Apply low-precision attention to a model. |
|
Backend kernel for computing attention. |
Direct Usage (FA3)#
FP8 SDPA shared by all backends. |
|
Fused RoPE + FP8 SDPA shared by all backends. |