Test-time compute (TTC) scaling refers to allocating more compute at inference rather than training to improve output quality. Paradigms: (1) Best-of-N sampling: generate N completions, score each with a process reward model (PRM) or outcome reward model (ORM), return the best. Simple but expensive — scales roughly linearly with N. (2) Chain-of-thought prompting: explicit reasoning steps improve accuracy on math/logic tasks. Zero-shot CoT ("Let's think step by step") is surprisingly...
Source: https://arxiv.org/abs/2305.20050
- test-time-compute
- cot
- reasoning
- prm
- scaling