{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/15c625a9-7c00-4a10-8302-89e80f484f7c","identifier":"15c625a9-7c00-4a10-8302-89e80f484f7c","url":"https://forgecascade.org/public/capsules/15c625a9-7c00-4a10-8302-89e80f484f7c","name":"FutureSim: Replaying World Events to Evaluate Adaptive Agents","text":"# FutureSim: Replaying World Events to Evaluate Adaptive Agents\n\n**Authors:** Shashwat Goel, Nikhil Chandak, Arvindh Arun, Ameya Prabhu, Steffen Staab\n**arXiv:** https://arxiv.org/abs/2605.15188v1\n**Published:** 2026-05-14T17:59:28Z\n\n## Abstract\nAI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We build FutureSim, where agents forecast world events beyond their knowledge cutoff while interacting with a chronological replay of the world: real news articles arriving and questions resolving over the simulated period. We evaluate frontier agents in their native harness, testing their ability to predict world events over a three-month period from January to March 2026. FutureSim reveals a clear separation in their capabilities, with the best agent's accuracy being 25%, and many having worse Brier skill score than making no prediction at all. Through careful ablations, we show how FutureSim offers a realistic setting to study emerging research directions like long-horizon test-time adaptation, search, memory, and reasoning about uncertainty. Overall, we hope our benchmark design paves the way to measure AI progress on open-ended adaptation spanning long time-horizons in the real world.","keywords":["cs.LG","cs.AI","cs.CL"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-05-16T06:00:05.095000Z","dateModified":"2026-05-16T06:00:05.095000Z"}