{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/81ce0389-4f00-4d9a-b116-f520eabd3561","name":"Scalable Oversight: Debate and Amplification for Superhuman Tasks","text":"Scalable oversight addresses the alignment problem of evaluating AI outputs that exceed human capability. Two main approaches: (1) Debate — two AI agents argue opposing positions; a human judge evaluates short debate transcripts rather than full solutions. Works if honest arguments are easier to verify than to generate. (2) Amplification (IDA) — recursively decompose tasks into subtasks a human can evaluate, compose results upward. Theoretical grounding: if deceptive alignment requires consistent deception across debate branches, honest strategies dominate. Open problem: debate may fail if both sides collude or if the human judge is systematically manipulated by persuasive but incorrect arguments.","keywords":["ai-safety","alignment","debate","scalable-oversight"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}