Memory-Based Agent: AI That Remembers and Learns
The final pattern in this series tackles one of the most critical challenges in AI: memory. Without memory, every interaction starts from scratch. With memory, agents can maintain context, learn from experience, and personalize their responses.
What is the Memory Pattern?
The Memory Pattern equips agents with the ability to store and retrieve information across interactions. This comes in two fundamental forms:
Short-Term Memory (Working Memory)
Everything within the current context window: - Conversation history - Recent tool results - Current task state
Limitations: - Bounded by the LLM’s context window (4K–200K tokens) - Disappears when the session ends - Gets truncated when too long
Long-Term Memory (Persistent Memory)
Information stored outside the LLM’s context, retrievable on demand: - Vector databases (Qdrant, Pinecone, Weaviate, Chroma) - Knowledge graphs - Structured databases - File systems
Why Memory Matters
| Without Memory | With Memory |
|---|---|
| Forgets user preferences | Remembers and adapts |
| Repeats mistakes | Learns from past errors |
| No personalization | Tailored responses |
| Limited to context window | Unlimited knowledge access |
| Every session starts fresh | Continuous improvement |
Architecture of a Memory-Based Agent
class MemoryAgent:
def __init__(self, llm, vector_store):
self.llm = llm
self.short_term = [] # conversation history
self.long_term = vector_store # persistent storage
def remember(self, text, metadata=None):
"""Store information in long-term memory"""
embedding = self.llm.embed(text)
self.long_term.upsert(text, embedding, metadata)
def recall(self, query, top_k=5):
"""Retrieve relevant memories"""
embedding = self.llm.embed(query)
return self.long_term.search(embedding, top_k)
def respond(self, user_input):
# 1. Search long-term memory for relevant context
memories = self.recall(user_input)
# 2. Build context from short-term + long-term memory
context = self.build_context(self.short_term, memories)
# 3. Generate response
response = self.llm.generate(context + user_input)
# 4. Update short-term memory
self.short_term.append({"user": user_input, "assistant": response})
# 5. Optionally store important info in long-term memory
if self.is_worth_remembering(user_input, response):
self.remember(f"User said: {user_input}\nI responded: {response}")
return responseTypes of Memory
Episodic Memory
Specific past interactions and experiences: - “Last time the user asked about Python, they preferred async examples” - “The user’s project uses FastAPI and PostgreSQL”
Semantic Memory
General knowledge and facts: - Documentation, tutorials, best practices - Domain-specific knowledge bases
Procedural Memory
Learned skills and workflows: - “When debugging, first check logs, then reproduce the issue” - Successful problem-solving patterns from past sessions
RAG: The Foundation of Long-Term Memory
Retrieval-Augmented Generation (RAG) is the most common implementation of long-term memory:
- Index: Convert documents/knowledge into embeddings
- Retrieve: Find the most relevant chunks for a given query
- Augment: Inject retrieved context into the LLM prompt
- Generate: Produce a grounded response
# Simple RAG pipeline
def rag_query(question, vector_store, llm):
# Retrieve relevant documents
docs = vector_store.similarity_search(question, k=5)
# Build augmented prompt
context = "\n".join([doc.content for doc in docs])
prompt = f"""Based on the following context, answer the question.
Context: {context}
Question: {question}"""
return llm.generate(prompt)Application Projects
Projects demonstrating the Memory Pattern in action will be added here as they are developed.
Potential projects:
- Personal Knowledge Assistant: An agent with RAG over your personal notes and documents
- Learning Companion: An agent that tracks your progress and adapts its teaching
- Customer Support Agent: Remembers past issues and user preferences
Key Takeaways
- Two types of memory serve different purposes: short-term for the current session, long-term for persistence
- RAG is the go-to implementation for long-term memory
- Memory makes agents personal — they can adapt to individual users
- Deciding what to remember is as important as the storage mechanism itself
- Memory completes the picture — combined with reflection, tools, planning, and collaboration, you have a fully capable agent
Series Recap
This series covered the five foundational agentic design patterns:
- Reflection — Self-improvement through critique
- Tool Use — Acting on the world
- ReAct — Reasoning and acting in harmony
- Multi-Agent — Collaboration and specialization
- Memory — Learning and remembering (this post)
Together, these patterns form the building blocks for creating powerful, autonomous AI systems.