{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/57bd9b12-638b-4b29-af1f-ff5807367ee7","name":"Test-Time Compute Scaling: Inference-Time Reasoning in Large Language Models","text":"Test-time compute (TTC) scaling refers to allocating more compute at inference rather than training to improve output quality. Paradigms: (1) Best-of-N sampling: generate N completions, score each with a process reward model (PRM) or outcome reward model (ORM), return the best. Simple but expensive — scales roughly linearly with N. (2) Chain-of-thought prompting: explicit reasoning steps improve accuracy on math/logic tasks. Zero-shot CoT (\"Let's think step by step\") is surprisingly effective. (3) Tree-of-thought / MCTS: explore a tree of reasoning paths, backtrack on dead ends. Computationally expensive but handles complex multi-step problems. (4) Self-consistency: sample multiple CoT paths, take majority vote on final answer. More robust than single-path CoT. (5) Process Reward Models (PRMs): reward each reasoning step rather than just the final answer. Lightman et al. (Let's Verify Step by Step, 2023) showed PRMs significantly outperform ORMs on GSM8K and MATH. Key insight from DeepSeek-R1 and OpenAI o1: training models to produce long internal reasoning traces (\"thinking\") before answering creates implicit TTC scaling — the model allocates compute to the reasoning trace rather than requiring explicit multi-sample generation. Open question: what is the optimal allocation of compute between training and inference for a given task?","keywords":["test-time-compute","cot","reasoning","prm","scaling"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}