Back to Glossary
Architecture

Memory (AI Context)

Definition

Memory in AI systems enables LLMs to retain and access information across interactions (including conversation history, user preferences, and learned facts) beyond the limitations of a single context window.

Why It Matters

LLMs are inherently stateless. Each API call starts fresh with no memory of previous interactions. This works for one-off questions but fails for applications that need continuity. A support agent that forgets your previous tickets is useless. A coding assistant that can’t remember what files it modified is frustrating.

Memory bridges this gap by persisting relevant information between interactions. Users get personalized experiences. Agents maintain context across multi-step tasks. Systems learn from past interactions and improve over time.

For AI engineers, memory is a design challenge with significant implications. How much context do you store? How do you retrieve it efficiently? How do you handle memory that becomes stale or incorrect? These decisions shape the user experience and system performance.

Implementation Basics

Memory systems range from simple to sophisticated:

1. Conversation History The simplest memory: store previous messages and include them in the prompt. Works for short conversations but hits context window limits quickly. Summarization can compress old messages while preserving key information.

2. Sliding Window Keep only the N most recent messages. Simple and predictable, but can lose important earlier context. Good for real-time chat where old messages matter less.

3. Summary Memory Periodically summarize conversation history into a condensed form. This preserves more information in fewer tokens. Trade-off: summaries lose detail.

4. Vector Memory (RAG-based) Store memories as embeddings in a vector database. Retrieve relevant memories based on the current query. Scales to large memory stores without context window constraints. The foundation for sophisticated memory systems.

5. Entity Memory Track specific entities mentioned in conversation (people, products, projects). Update facts about each entity. Retrieve entity information when relevant. Good for business applications with structured knowledge.

6. Long-term Knowledge Store learned facts, user preferences, and system configurations. Persist across sessions. Update based on interactions. This is what makes AI feel “personalized.”

Memory Architecture Considerations

  • Write policy: What gets stored? Every message? Only user preferences? Important facts?
  • Retrieval strategy: How do you decide what memories are relevant to the current query?
  • Staleness handling: How do you update or expire outdated information?
  • Privacy: What data can you store? For how long? User consent requirements?

Practical Tips

  • Start with conversation history before building complex memory systems.
  • Use embeddings for semantic retrieval since keyword matching misses relevant memories.
  • Set memory limits. Unbounded memory stores grow expensive and slow.
  • Let users view and manage their stored memories for transparency.

Memory transforms stateless LLMs into persistent, personalized assistants. The implementation complexity is worth it for applications where continuity matters.

Source

Generative Agents demonstrates believable human behavior through agents with memory streams that record observations, reflections, and plans for retrieval during decision-making.

https://arxiv.org/abs/2304.03442