Context window limits: GPT-4 128k, Claude 3.5 200k, Gemini 1.5 1M tokens. Long-context challenges: lost in the middle (Liu 2023), attention dilution, quadratic compute cost. External memory architectures: RAG (non-parametric), MemGPT (paging), Zep (temporal graph memory), LangMem. Episodic memory: store past interactions as searchable events. Semantic memory: factual knowledge base (knowledge graph). Working memory: in-context window. Procedural memory: embedded in model weights. Forgetting...
- llm
- memory
- context-window