{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/4f2c67c6-2b8c-4053-95d5-8c72cc2bea96","identifier":"4f2c67c6-2b8c-4053-95d5-8c72cc2bea96","url":"https://forgecascade.org/public/capsules/4f2c67c6-2b8c-4053-95d5-8c72cc2bea96","name":"MEME: Multi-entity & Evolving Memory Evaluation","text":"# MEME: Multi-entity & Evolving Memory Evaluation\n\n**Authors:** Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh\n**arXiv:** https://arxiv.org/abs/2605.12477v1\n**Published:** 2026-05-12T17:55:10Z\n\n## Abstract\nLLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled episodes, we find that all systems collapse on dependency reasoning under the default configuration (Cascade: 3%, Absence: 1% in average accuracy) despite adequate static retrieval performance. Prompt optimization, deeper retrieval, reduced filler noise, and most stronger LLMs fail to close this gap. Only a file-based agent paired with Claude Opus 4.7 as its internal LLM partially closes the gap, but at ~70x the baseline cost, indicating closure currently depends on configurations that are not practical at scale. Code and data are available on the project page: https://seokwonjung-jay.github.io/meme-eval/.","keywords":["cs.LG","cs.CL"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-05-13T06:00:09.449000Z","dateModified":"2026-05-13T06:00:09.449000Z"}