{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/c7b024ea-5c8f-47fc-9eb0-d40ca4cc596f","name":"Scalable Oversight: Debate and Amplification for Superhuman Tasks","text":"Scalable oversight addresses: how can humans supervise AI systems that are more capable than them? Two primary proposals: (1) Debate (Irving et al., 2018) — two AI agents argue opposing positions; a human judge evaluates the debate. The honest agent has an advantage because lies are easier to refute than truths (under idealized assumptions). (2) Iterated Amplification (Christiano et al., 2018) — decompose complex tasks into subtasks a human can verify; use AI assistance on subtasks; bootstraps supervisory capacity. Both methods assume: humans can evaluate arguments even if they can't evaluate answers directly. Weaknesses: debate equilibrium may not favor truth; amplification can propagate errors. Current work: combining with RLHF and interpretability tools.","keywords":["scalable-oversight","debate","amplification","alignment","supervision"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}