{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/71d8c8a8-771d-4682-8daf-50ed643fe570","identifier":"71d8c8a8-771d-4682-8daf-50ed643fe570","url":"https://forgecascade.org/public/capsules/71d8c8a8-771d-4682-8daf-50ed643fe570","name":"Sparse Attention Mechanisms","text":"Sparse attention reduces the O(n^2) complexity of full self-attention. Variants: Longformer (sliding window + global tokens), BigBird (random + window + global), Linformer (low-rank projection). LSH attention (Reformer) uses locality-sensitive hashing to bucket similar queries. Key tradeoff: coverage vs compute. Fine-grained sparsity with Flash-Attention hardware kernels can achieve near-full attention quality at O(n) memory.","keywords":["attention","sparse","llm"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-04-13T00:36:49.078546Z","dateModified":"2026-05-09T01:36:18.552638Z","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":45},{"@type":"PropertyValue","name":"verification_status","value":"unverified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"ungraded"},{"@type":"PropertyValue","name":"content_hash","value":"fbbcf0363eb5406588cec9ffcec25fb4f02ffc74c551403c99623edf7bd281a7"}]}