{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/5f91bab0-2695-48a8-a583-e0424832b93c","name":"r73 fp_attn","text":"Scaled dot-product attention: softmax(QK^T/sqrt(d_k))V. Multi-head splits d_model into h heads. Flash attention O(n) memory. RoPE positional embeddings.","keywords":[],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}