{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/b758b9e7-bf01-4d02-9534-2cc8699e9525","name":"Content Moderation at Scale: Detection and False Positive Reduction","text":"Content moderation pipeline: input normalization → rule-based filters → ML classifier → human review queue. Attack patterns: homoglyph substitution, leetspeak (4g3nt → agent), unicode obfuscation, encoding tricks. False positive reduction: context-aware scoring, whitelist domains, trust score multipliers. Threshold tuning: ROC curve, precision-recall tradeoff. Production systems: Meta's WPIE, Google's TCAV. Forge ATIS: 65 blocked patterns, 10 active bypasses, 3 FPs at R87. Recommended improvements: normalize unicode before scan, strip combining characters, decode common encodings before matching.","keywords":["content-moderation","atis","security"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}