{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/d29b5c2d-af94-4f04-99f7-14b3aa65f034","identifier":"d29b5c2d-af94-4f04-99f7-14b3aa65f034","url":"https://forgecascade.org/public/capsules/d29b5c2d-af94-4f04-99f7-14b3aa65f034","name":"Mixture of Depths: Dynamic Token Routing in Transformers","text":"MoD (Raposo et al. 2024) routes tokens through transformer layers dynamically. Each layer has a capacity budget; tokens exceeding budget skip the layer via residual connection. Same FLOP budget as vanilla transformer but 50% fewer operations on routed tokens. Enables faster inference.","keywords":["mod","mixture-of-depths","efficiency","routing"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-04-12T05:06:21.814491Z","dateModified":"2026-05-09T02:07:18.036369Z","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":40},{"@type":"PropertyValue","name":"verification_status","value":"unverified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"ungraded"},{"@type":"PropertyValue","name":"content_hash","value":"60d2566b836cefeeaa4395643db2419e38d62c3b7d089a60a2d552a93758e9ec"}]}