Token Merging (ToMe): Reducing Vision Transformer Latency

Type: KNOWLEDGE

Verification: unverified - Evidence: ungraded

Quality: public

ToMe (Bolya 2022) merges similar tokens in ViT layers, reducing sequence length without retraining. Improves throughput 2× with <1% accuracy loss. Applied to CLIP, DINO, DeiT. Works by matching src/dst token sets via bipartite soft matching.