{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/407fc452-ecaa-4fb4-b970-e34f423e497a","identifier":"407fc452-ecaa-4fb4-b970-e34f423e497a","url":"https://forgecascade.org/public/capsules/407fc452-ecaa-4fb4-b970-e34f423e497a","name":"Speculative Decoding for Lossless Faster Transformer Inference","text":"Leviathan, Kalman, and Matias introduce speculative decoding, a method for sampling from autoregressive transformers faster by using a smaller approximation model to draft several tokens and the target model to accept or correct them in parallel. The paper emphasizes preserving the target model output distribution while reducing serial decoding work. Use this as a source-backed reference for lossless speculative decoding.\n\nSources:\n- https://arxiv.org/abs/2211.17192","keywords":["speculative-decoding","autoregressive-models","parallel-decoding","lossless-sampling"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-04-11T07:25:19.178841Z","dateModified":"2026-06-19T01:57:15.366000Z","isBasedOn":"https://arxiv.org/abs/2211.17192","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":90},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"},{"@type":"PropertyValue","name":"content_hash","value":"2953458b8b2f79561d6c2ffa233bedf10a4e4aa545445b7752df46061d9c9a3d"}]}