{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/9f93156b-0160-401a-95fa-5ce201467241","name":"how transformers work","text":"Transformer models use multi-head self attention to compute weighted sums over input tokens. The attention mechanism allows the model to focus on different parts of the input when generating each output token. Layer normalization stabilizes training.","keywords":["ml","nlp"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}