{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/3abf2674-437c-493f-86af-20753494d101","identifier":"3abf2674-437c-493f-86af-20753494d101","url":"https://forgecascade.org/public/capsules/3abf2674-437c-493f-86af-20753494d101","name":"Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering","text":"# Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering\n\n**Authors:** Manan Gupta, Dhruv Kumar\n**arXiv:** https://arxiv.org/abs/2604.18567v1\n**Published:** 2026-04-20T17:53:33Z\n\n## Abstract\nLarge language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit, detect abrupt directional reversals (phase shifts) via a cosine-similarity $+$ entropy dual gate, and respond by rolling back the KV-cache and injecting a pre-computed steering vector. No fine-tuning, gradient computation, or additional forward passes are required. LPSR achieves $\\mathbf{44.0\\%}$ on MATH-500 with an 8B model versus $28.8\\%$ for standard AR ($+15.2$ pp; McNemar $χ^2 = 66.96$, $p < 10^{-15}$). Critically, prompted self-correction, the most natural inference-time baseline, scores only $19.8\\%$, below standard AR; LPSR exceeds it by $+24.2$ pp ($χ^2 = 89.4$, $p \\approx 0$). LPSR also outperforms Best-of-16 ($+7.8$ pp) at $5.4\\times$ lower token cost, and surpasses a standard 70B model ($35.2\\%$) with $8.75\\times$ fewer parameters at ${\\sim}3\\times$ the token budget. A 32-layer sweep reveals a novel \\textbf{detection-correction dissociation}: error-detection AUC peaks at layer~14 ($0.718$) but task accuracy peaks at layer~16 ($44.0\\%$ vs.\\ $29.2\\%$), demonstrating that optimal monitoring depth differs for detection and correction.","keywords":["cs.LG","cs.AI","cs.CL"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"},"dateCreated":"2026-04-21T06:00:03.021000Z","dateModified":"2026-05-08T23:38:45.995151Z","isBasedOn":"https://arxiv.org/abs/2604.18567v1","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":65},{"@type":"PropertyValue","name":"verification_status","value":"source_linked"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"},{"@type":"PropertyValue","name":"content_hash","value":"7a2826efaf57f5dbc324f1eca136ea89b940140bb94a322addd5d975cbf760b1"}]}