The Problem
Ask an LLM about the history of breakfast, and it will tell you a story that goes back thousands of years. Ask what it had for breakfast, and chances are that's ancient, long-discarded context. Every chat starts fresh—no state, no history, no memory of you.
Memory systems fix this. They persist information across sessions without burning context window tokens. The hard part is retrieval: how do you find the right memories when you need them?
OpenClaw's implementation is worth studying because it makes pragmatic tradeoffs that work: plain Markdown as source of truth, hybrid search combining vectors with keywords, and a clever pre-compaction flush that saves memories before context overflow.
1. Files as Source of Truth
Most memory systems start with a database. OpenClaw starts with Markdown files. No proprietary format. No complex schema. Just text you can read, edit, and version control.
Memory File Structure
Curated Memory
Stable facts, preferences, decisionsDaily Logs
Append-only running contextIndex Layer
SQLite + embeddings for fast retrievalMEMORY.md holds curated, long-term facts—things the agent should always know about you. Your role, your preferences, key decisions.
memory/YYYY-MM-DD.md files are append-only daily logs. The agent writes running notes here during sessions. These accumulate over time.
The SQLite index lives at ~/.openclaw/memory/{agentId}.sqlite. It stores chunked embeddings for fast semantic search while keeping the Markdown files as the canonical source.
The index is derived, the files are canonical. When something breaks, debugging is reading—not querying. You can grep memories, git diff them, edit them in any editor. The database serves the files, not the other way around.
2. Hybrid Search: Union, Not Intersection
Pure vector search fails on exact matches. Ask for "commit ab3f2c1" and you get commits with similar descriptions, not the exact hash. Pure keyword search fails on semantic queries. Ask "what did we decide about the architecture?" and BM25 won't understand that "we chose microservices" is a match.
Most hybrid systems take the intersection—results must score well on both. OpenClaw takes the union. Here's the merge from src/memory/hybrid.ts:
// src/memory/hybrid.ts, lines 41-70
export function mergeHybridResults(params: {
vector: HybridVectorResult[];
keyword: HybridKeywordResult[];
vectorWeight: number;
textWeight: number;
}): Array<{...}> {
const byId = new Map<string, {...}>();
// Add all vector results
for (const r of params.vector) {
byId.set(r.id, {
...r,
vectorScore: r.vectorScore,
textScore: 0,
});
}
// Merge in keyword results (union, not intersection)
for (const r of params.keyword) {
const existing = byId.get(r.id);
if (existing) {
existing.textScore = r.textScore;
} else {
byId.set(r.id, { ...r, vectorScore: 0, textScore: r.textScore });
}
}
// Weighted combination
const merged = Array.from(byId.values()).map((entry) => {
const score = params.vectorWeight * entry.vectorScore
+ params.textWeight * entry.textScore;
return { ...entry, score };
});
return merged.toSorted((a, b) => b.score - a.score);
}The key insight: this is a union, not an intersection. Results from either search contribute. If a chunk scores high on vector similarity but doesn't contain the keyword, it still gets included (with textScore: 0). If a chunk contains the exact keyword but isn't semantically similar, it also gets included (with vectorScore: 0).
The default weights are 70% vector, 30% keyword:
// Default config
hybrid: { vectorWeight: 0.7, textWeight: 0.3 }BM25 Rank Normalization
BM25 returns ranks (1st, 2nd, 3rd...), not scores. To combine with vector similarity scores (which are 0-1), OpenClaw converts ranks to scores using a simple inversion:
// src/memory/hybrid.ts, lines 36-39
export function bm25RankToScore(rank: number): number {
const normalized = Number.isFinite(rank) ? Math.max(0, rank) : 999;
return 1 / (1 + normalized); // Rank 0 → 1.0, Rank 1 → 0.5, Rank 9 → 0.1
}The formula creates smooth decay—top results dominate, but lower ranks still contribute.
Keyword Query Building
Before searching, the raw query gets tokenized into an FTS5 query:
// src/memory/hybrid.ts, lines 23-34
export function buildFtsQuery(raw: string): string | null {
const tokens = raw.match(/[A-Za-z0-9_]+/g)
?.map((t) => t.trim()).filter(Boolean) ?? [];
if (tokens.length === 0) return null;
const quoted = tokens.map((t) => `"\${t.replaceAll('"', "")}"`);
return quoted.join(" AND ");
}This is important: keyword search requires all tokens to match (AND logic). "commit hash" becomes "commit" AND "hash". This makes keyword search strict—good for exact matches, but it won't find partial matches.
Search Orchestration
The manager runs both searches sequentially, then merges:
// src/memory/manager.ts, lines 290-310
async search(query: string, opts?: {...}): Promise<MemorySearchResult[]> {
// Over-fetch candidates for merging (default multiplier: 4)
const candidates = Math.min(200, maxResults * hybrid.candidateMultiplier);
// Keyword search first
const keywordResults = hybrid.enabled
? await this.searchKeyword(cleaned, candidates).catch(() => [])
: [];
// Then vector search
const queryVec = await this.embedQueryWithTimeout(cleaned);
const vectorResults = hasVector
? await this.searchVector(queryVec, candidates).catch(() => [])
: [];
// Merge with normalized weights
const merged = this.mergeHybridResults({
vector: vectorResults,
keyword: keywordResults,
vectorWeight: hybrid.vectorWeight, // 0.7
textWeight: hybrid.textWeight, // 0.3
});
return merged.filter((entry) => entry.score >= minScore).slice(0, maxResults);
}The candidateMultiplier (default: 4) means if you request 6 results, the system fetches 24 candidates from each search before merging. This over-fetching ensures the merge has enough material to produce good final results.
Note the .catch(() => []) wrapping each search independently. If embeddings fail, keyword search still completes. If keyword search fails, vector search still completes. Either can fail without killing the other.
Union beats intersection for recall. A commit hash that only matches keywords still surfaces. A semantic match that misses the exact words still surfaces. You get both worlds, weighted by what matters more for your use case.
3. Pre-Compaction Flush: Save Before You Forget
Context windows fill up. When they do, older messages get compacted (summarized and discarded). This is where memories would normally be lost.
OpenClaw adds a clever intervention: before compaction happens, it triggers an automatic "agentic turn" that prompts the model to save anything important.
From src/auto-reply/reply/memory-flush.ts:
// lines 10-14
export const DEFAULT_MEMORY_FLUSH_PROMPT = [
"Pre-compaction memory flush.",
"Store durable memories now (use memory/YYYY-MM-DD.md; create memory/ if needed).",
`If nothing to store, reply with \${SILENT_REPLY_TOKEN}.`,
].join(" ");The trigger logic checks three conditions:
// src/auto-reply/reply/memory-flush.ts, lines 77-105
export function shouldRunMemoryFlush(params: {
entry?: Pick<SessionEntry, "totalTokens" | "compactionCount" | "memoryFlushCompactionCount">;
contextWindowTokens: number;
reserveTokensFloor: number;
softThresholdTokens: number;
}): boolean {
const totalTokens = params.entry?.totalTokens;
if (!totalTokens || totalTokens <= 0) return false;
// Calculate threshold: contextWindow - reserve - softThreshold
const threshold = Math.max(0,
params.contextWindowTokens - params.reserveTokensFloor - params.softThresholdTokens
);
if (totalTokens < threshold) return false;
// Only flush once per compaction cycle
const compactionCount = params.entry?.compactionCount ?? 0;
const lastFlushAt = params.entry?.memoryFlushCompactionCount;
if (typeof lastFlushAt === "number" && lastFlushAt === compactionCount) {
return false; // Already flushed this cycle
}
return true;
}The default soft threshold is 4,000 tokens:
export const DEFAULT_MEMORY_FLUSH_SOFT_TOKENS = 4000;This gives the agent headroom to save memories before the hard limit triggers compaction.
Context compaction is inevitable. The question is whether you lose information or save it first. The flush prompt turns a destructive operation into a checkpoint—memories survive because the system asks before it forgets.
4. Embedding Providers: Local First, Cloud Fallback
OpenClaw tries local embeddings first, then falls back to cloud providers. You might have a local model, or an OpenAI key, or just Gemini—it figures out what's available.
From src/memory/embeddings.ts:
// lines 139-171
if (requestedProvider === "auto") {
// Try local first if configured and model file exists
if (canAutoSelectLocal(options)) {
try {
const local = await createProvider("local");
return { ...local, requestedProvider };
} catch (err) {
localError = formatLocalSetupError(err);
}
}
// Fall through to remote providers
for (const provider of ["openai", "gemini"] as const) {
try {
const result = await createProvider(provider);
return { ...result, requestedProvider };
} catch (err) {
if (isMissingApiKeyError(err)) {
missingKeyErrors.push(message);
continue; // Try next provider
}
throw new Error(message, { cause: err });
}
}
}The priority order:
- Local (ggml models like embeddinggemma-300M) — if configured and model file exists
- OpenAI (text-embedding-3-small) — if API key present
- Gemini — if API key present
If all fail, memory search falls back to BM25-only (keyword search without embeddings). The system degrades gracefully rather than breaking.
Local models require downloading the model file separately. If you're seeing slow indexing, check openclaw memory status --deep to verify which provider is being used.
5. Chunking: Overlap Preserves Context
Before embedding, Markdown files get split into chunks. The chunking preserves context across boundaries using overlap—the end of one chunk becomes the beginning of the next.
From src/memory/internal.ts:
// lines 166-200 (simplified)
export function chunkMarkdown(
content: string,
chunking: { tokens: number; overlap: number },
): MemoryChunk[] {
const lines = content.split("\n");
const maxChars = Math.max(32, chunking.tokens * 4); // ~4 chars per token
const overlapChars = Math.max(0, chunking.overlap * 4);
// ... chunking logic ...
const carryOverlap = () => {
// Keep last N chars worth of lines for overlap
let acc = 0;
const kept = [];
for (let i = current.length - 1; i >= 0; i--) {
acc += entry.line.length + 1;
kept.unshift(entry);
if (acc >= overlapChars) break;
}
current = kept;
};
}Default configuration:
chunking: { tokens: 400, overlap: 80 }Each chunk targets 400 tokens (~1,600 characters). The last 80 tokens (~320 characters) of each chunk carry forward to the next. This overlap means a sentence that spans a chunk boundary will appear in both chunks, improving retrieval for queries that match that sentence.
6. The Memory Tool: Search as a Skill
The agent accesses memory through a tool called memory_search. Here's the definition from src/agents/tools/memory-tool.ts:
// lines 37-55
return {
label: "Memory Search",
name: "memory_search",
description:
"Mandatory recall step: semantically search MEMORY.md + memory/*.md " +
"(and optional session transcripts) before answering questions about " +
"prior work, decisions, dates, people, preferences, or todos; " +
"returns top snippets with path + lines.",
parameters: MemorySearchSchema,
execute: async (_toolCallId, params) => {
const query = readStringParam(params, "query", { required: true });
const { manager, error } = await getMemorySearchManager({ cfg, agentId });
if (!manager) {
return jsonResult({ results: [], disabled: true, error });
}
const results = await manager.search(query, { maxResults, minScore, sessionKey });
return jsonResult({
results,
provider: status.provider,
model: status.model,
fallback: status.fallback,
});
},
};The description tells the model when to use it: "before answering questions about prior work, decisions, dates, people, preferences, or todos." This guidance is part of the prompt, so the model learns to call memory_search proactively.
Results include the provider and fallback status, so the agent knows whether it's getting semantic results (embeddings worked) or just keyword matches (fallback mode).
The tool description is part of the prompt. "Mandatory recall step... before answering questions about prior work, decisions, dates" teaches the model when to reach for memory, not just how.
7. Key Constants
Memory System Limits
| Feature | Constant | Value | Purpose |
|---|---|---|---|
| SNIPPET_MAX_CHARS | SNIPPET_MAX_CHARS | 700 | Max chars returned per result |
| EMBEDDING_BATCH_MAX_TOKENS | EMBEDDING_BATCH_MAX_TOKENS | 8000 | Batch size for embedding API calls |
| EMBEDDING_INDEX_CONCURRENCY | EMBEDDING_INDEX_CONCURRENCY | 4 | Parallel embedding requests |
| DEFAULT_MEMORY_FLUSH_SOFT_TOKENS | DEFAULT_MEMORY_FLUSH_SOFT_TOKENS | 4000 | Buffer before compaction flush |
| Chunk tokens | chunking.tokens | 400 | Target size per chunk |
| Chunk overlap | chunking.overlap | 80 | Overlap between chunks |
Patterns Worth Stealing
Five design decisions that transfer to any agent memory system:
Files as source of truth. The Markdown files are canonical. The SQLite index is derived. This means you can always inspect, edit, or recover memories by looking at text files. Debugging is reading, not querying.
Hybrid search as weighted union. By taking the union of vector and keyword results (not intersection), queries that work well with either approach succeed. The 70/30 weighting lets you tune the balance.
Pre-compaction flush. Instead of losing memories when context overflows, the system prompts the agent to save first. The memoryFlushCompactionCount tracking prevents double-flushing.
Graceful degradation. If embeddings fail, keyword search still works. If keyword search fails, vector search still works. If both fail, you still have the Markdown files.
Local-first, not local-only. The provider fallback chain (local → OpenAI → Gemini) means you can run fully offline with a local model, or use cloud embeddings if that's easier.
None of this is OpenClaw-specific. The questions generalize: What's your source of truth? How do you handle partial failures? What happens when context overflows?
See also: Agent Memory: From Stateless to Stateful AI for the conceptual foundations, The Intelligence Layer: How OpenClaw Thinks for the broader agent architecture, and Building Personal AI Infrastructure for the gateway and channel system.
