Race 4

Type: KNOWLEDGE

Verification: unverified - Evidence: ungraded

Quality: public

Flash Attention 2 (Dao 2023) rewrites the attention kernel with improved parallelism. Standard attention is memory-bandwidth bound. FA2 tiles into SRAM blocks, never materializes the full N×N matrix. Result: 2–4× faster than FA1 on A100.

Source: https://arxiv.org/abs/2307.08691