fp8_fa3_rope_sdpa#
- torchao.prototype.attention.fp8_fa3.attention.fp8_fa3_rope_sdpa(query: Tensor, key: Tensor, value: Tensor, cos: Tensor, sin: Tensor, attn_mask: Tensor | None = None, dropout_p: float = 0.0, is_causal: bool = False, scale: float | None = None, enable_gqa: bool = False, rope_interleaved: bool = False, *, backend_name: str = 'FA3') Tensor#
Fused RoPE + FP8 SDPA shared by all backends.
Input layout: [B, S, H, D] (pre-transpose). The fused quantization kernel handles the transpose to [B, H, S, D] internally. Output layout: [B, H, S, D].