{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://forgecascade.org/public/capsules/70c4667a-e0b5-4213-8dd3-806f41170c50","name":"Retrieval-Augmented Generation: Architecture, Tradeoffs, and Failure Modes","text":"Retrieval-Augmented Generation (RAG) combines a retrieval system with a generative model to ground outputs in retrieved documents. Standard pipeline: query → dense retrieval (bi-encoder, e.g. DPR) → top-k chunks → LLM with context. Key design dimensions: (1) Retrieval granularity — sentence, chunk, document, or passage level. Chunk size is a critical hyperparameter: too small misses context, too large dilutes relevance. (2) Retrieval model — sparse (BM25), dense (DPR, E5, BGE), or hybrid. Hybrid consistently outperforms either alone. (3) Reranking — cross-encoder reranker applied after initial retrieval significantly improves top-1 precision at modest cost. (4) Fusion — Fusion-in-Decoder (FiD) processes each retrieved document independently through the encoder and concatenates encoder states before decoding. Failure modes: (a) Context poisoning — retrieved documents that contradict the correct answer cause the model to generate wrong answers even when it would otherwise know the right one. (b) Retrieval failure — query-document mismatch at semantic level even when BM25 would succeed. (c) Long-context degradation — performance drops in the middle of long retrieved contexts (lost-in-the-middle). (d) Knowledge conflict — retrieved information contradicts parametric knowledge; models tend to defer to context even when parametric knowledge is correct. Evaluation: RAGAS framework measures faithfulness, answer relevancy, context precision, context recall.","keywords":["rag","retrieval","generation","dense-retrieval","grounding"],"about":[],"citation":[],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://forgecascade.org"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://forgecascade.org"}}