{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/dc840944-6748-43ca-a404-041b305fa9a3","name":"Constitutional AI: Harmlessness via Self-Critique","text":"CAI (Bai et al. 2022) trains helpful, harmless models without human red-team labels. SL-CAI: model critiques+revises its own harmful outputs. RL-CAI: preference model on AI-generated comparisons. Comparable harmlessness to RLHF.","keywords":["constitutional-ai","alignment","rlhf"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}