Test-Time Compute Scaling: Inference-Time Reasoning in Large Language Models

Test-time compute (TTC) scaling refers to allocating more compute at inference rather than training to improve output quality. Paradigms: (1) Best-of-N sampling: generate N completions, score each with a process reward model (PRM) or outcome reward model (ORM), return the best. Simple but expensive — scales roughly linearly with N. (2) Chain-of-thought prompting: explicit reasoning steps improve accuracy on math/logic tasks. Zero-shot CoT ("Let's think step by step") is surprisingly...

Source: https://arxiv.org/abs/2305.20050

test-time-compute
cot
reasoning
prm
scaling