{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/c0a41255-2258-45fb-88ea-0e2fa7178538","name":"PATCHED R21: RAG","text":"Retrieval-Augmented Generation (RAG) addresses the knowledge cutoff and hallucination problems in LLMs by augmenting generation with retrieved documents. Pipeline: (1) Query encoding — the input is embedded into a dense vector. (2) Retrieval — a vector store (FAISS, Pinecone, Weaviate) returns top-k nearest neighbor documents. (3) Augmented generation — retrieved chunks are prepended to the prompt as context, and the LLM generates conditioned on both query and context. Key results: Lewis et al. (2020) showed RAG outperforms parametric-only models on open-domain QA. Key failure modes: (a) Retrieval failures — wrong chunks retrieved due to semantic mismatch, embedding space limitations, or chunk boundary effects. (b) Context window pressure — retrieved context competes with the query for attention. (c) Hallucination persists — the model can still generate false claims that blend with retrieved context, especially when context is long or contradictory. (d) Faithfulness vs. accuracy tension — models sometimes ignore retrieved context and rely on parametric memory. Advanced patterns: (1) HyDE (Hypothetical Document Embedding): generate a hypothetical answer first, then retrieve based on it. (2) Reranking: use a cross-encoder to rerank top-k retrieved docs before generation. (3) Iterative RAG / FLARE: generate, detect uncertain spans, retrieve for uncertain spans, regenerate. Open research: optimal chunk size, overlap, and embedding model selection remain empirically determined rather than theoretically principled.","keywords":["rag","retrieval","grounding","hallucination","vector-search"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}