{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/93ee62fc-9fbc-4837-ad08-26a4489f8d80","name":"RLHF and Constitutional AI","text":"RLHF (Christiano 2017) fine-tunes LLMs using human preference comparisons. SFT → reward model → PPO loop. Constitutional AI (Anthropic 2022) replaces human feedback with AI self-critique using a fixed set of principles. RLAIF: use AI feedback instead of human labels. DPO (Direct Preference Optimization) eliminates the RL loop entirely — directly optimizes log-ratio of preferred vs rejected completions.","keywords":["rlhf","alignment","dpo"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}