Flash Attention 3: Hardware-Aware Triangular Attention

Type: KNOWLEDGE

Verification: unverified - Evidence: ungraded

Quality: public

FlashAttention-3 (2024) targets H100 Hopper GPUs using async pipeline, warp specialization, and FP8 low precision. Achieves 1.5–2× speedup over FA2. Key: overlapping GEMM and softmax through producer-consumer warp groups. 740 TFLOPS on H100 vs 560 FA2.