Memory System
How Edward remembers everything — types, retrieval, extraction, and background enrichment.
Overview
Edward's memory system is what makes him different from a stateless chatbot. Every conversation is mined for memorable information — facts, preferences, context, instructions — and stored in PostgreSQL with vector embeddings. On future turns, relevant memories are retrieved and injected into the LLM context so Edward can reference things you told him weeks ago.
Memory Types
Each memory is classified into one of four types during extraction:
| Type | Description | Example |
|---|---|---|
fact | Objective information about the user or world | "User's dog is named Luna" |
preference | User likes, dislikes, or style preferences | "Prefers dark mode in all apps" |
context | Situational or temporal context | "Starting a new job at Acme Corp next Monday" |
instruction | Explicit directives from the user | "Always respond in bullet points" |
Temporal Nature
Memories also carry a temporal nature that affects how they're weighted over time:
| Temporal | Description | Behavior |
|---|---|---|
timeless | Permanently relevant facts | No decay — always full weight |
temporary | Short-lived context | Decays over time, eventually irrelevant |
evolving | Facts that may change | Boosted when recently updated, decays otherwise |
Memory Tiers
Each memory is assigned a confidence tier:
| Tier | Description |
|---|---|
observation | Inferred from conversation — may not be explicitly stated |
belief | Reasonably confident based on context |
knowledge | Explicitly stated by the user — high confidence |
Hybrid Retrieval
When Edward needs to recall memories, he uses a hybrid scoring approach:
- 70% vector similarity — pgvector cosine distance using
all-MiniLM-L6-v2embeddings (384 dimensions) - 30% BM25 keyword matching — traditional text search for exact term hits
This combination catches both semantically similar memories and ones that share specific keywords. The context budget is capped at 8,000 characters to avoid overwhelming the LLM.
Memory Extraction
After every conversation turn, Edward runs a memory extraction step using Claude Haiku 4.5. The extractor analyzes the conversation and identifies any new memorable information. For each extracted memory, it assigns:
- Memory type (
fact,preference,context,instruction) - Importance score (0-10)
- Temporal nature (
timeless,temporary,evolving) - Confidence tier (
observation,belief,knowledge)
Duplicate detection prevents the same information from being stored multiple times. Existing memories are updated rather than duplicated.
Deep Retrieval
For complex conversations, Edward activates deep retrieval — a pre-turn gate that runs when the message is short or the conversation has reached 3+ turns. It fires 4 parallel memory queries:
- The original user message
- 3 Haiku-rewritten query variations targeting different angles
Results are deduplicated and merged, giving the LLM a richer context window than a single query would provide.
Reflection
After each turn, a fire-and-forget reflection step generates 3-5 Haiku-crafted queries to find memories related to the current conversation. The results are stored in the memory_enrichments table and loaded on the next turn to provide deeper context. This runs asynchronously and adds zero latency to the current response.
Consolidation
An hourly background loop clusters related memories via Haiku. It creates:
- Memory connections — links between related memories
- Memory flags — quality and staleness markers
Consolidation is disabled by default and can be enabled via the REST API or settings UI.
Memory Tools
Edward has direct access to memory management tools (always available, not gated by skills):
| Tool | Description |
|---|---|
remember_update | Create or update a memory |
remember_forget | Delete a specific memory |
remember_search | Search memories by query |