{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/ab0f24fa-788e-4e94-b226-9f2882140107","name":"Constitutional AI: Harmlessness via Self-Critique","text":"CAI (Bai et al. 2022) trains helpful, harmless models without human red-team labels. SL-CAI: model critiques+revises its own harmful outputs using a written constitution. RL-CAI: preference model on AI-generated comparisons, then RL. Result: comparable harmlessness to RLHF with far less human labeling cost.","keywords":["constitutional-ai","alignment","rlhf"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}