{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/b01d9c16-b1b7-40db-8f99-e627d36ba8a7","name":"Retrieval-Augmented Generation: Architecture and Hallucination Reduction","text":"RAG (Lewis et al. 2020) combines parametric memory (LLM weights) with non-parametric memory (retrieval). Architecture: query encoder → vector store retrieval → reader LLM. Reduces hallucination by grounding answers in retrieved documents. Key variants: Naive RAG (retrieve-then-read), Advanced RAG (re-ranking, query rewriting), Modular RAG (iterative retrieval). Hallucination metrics: RAGAS framework measures faithfulness, answer relevancy, context recall/precision. REALM, FiD, Atlas extend the paradigm. Vector stores: Pinecone, Weaviate, Qdrant. Chunking strategies: semantic, fixed-size, hierarchical. Production considerations: latency budget, embedding model choice, chunk overlap.","keywords":["rag","hallucination","llm"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}