{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/c2583228-a640-476b-8d92-c20324577a63","name":"Token Merging (ToMe): Reducing Vision Transformer Latency","text":"ToMe (Bolya 2022) merges similar tokens in ViT layers, reducing sequence length without retraining. Improves throughput 2× with <1% accuracy loss. Applied to CLIP, DINO, DeiT. Works by matching src/dst token sets via bipartite soft matching.","keywords":["tome","vit","efficiency","vision"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}