How I turned 300,000 conversation fragments into working memory for Claude Code.
Every conversation with an AI coding assistant vanishes. You spend four hours building an authentication system with Claude, navigating design decisions, debugging edge cases, discovering that one obscure API quirk — and when the context window fills up, it's gone. The next session starts from zero.
I've had over 1,500 coding sessions with Claude Code across six projects in the past three months. That's 300,000 conversation entries: prompts, responses, chains of thought, file edits, bash commands, search results, screenshots. About 1.6 million tokens of accumulated context — decisions made, problems solved, patterns discovered, mistakes corrected.
Claw-memory is a system that captures all of it and turns it back into working memory.
Entries Extracted
308,866
Over 105 days
Tokens
1.6M
Accumulated context
Session Files
1,575
Across 6 projects
The Shape of the Data
Claude Code stores conversations as append-only JSONL files. Each session is a single file, one JSON record per line. Every user message, assistant response, thinking block, and tool invocation gets its own record, linked into a conversation tree via parent UUIDs.
The critical insight is that these files are immutable append-only logs. Claude Code never rewrites or truncates them. Even compaction — when the context window fills and earlier messages get summarized — just appends a summary record. The original messages stay in the file forever. This makes them perfect for extraction: you can tail from a byte offset and never miss anything.
Append-only is the right primitive. The source files are append-only. The corpus is append-only. The entity links are append-only. This makes everything idempotent, recoverable, and simple to reason about.
A file watcher polls every three seconds. It stats each session file, detects new bytes, parses the appended JSON, extracts the conversation content, and writes it to a local corpus file (context.jsonl). Same format, same append-only guarantee. A UUID-based deduplication layer makes the whole pipeline idempotent — re-reading from offset zero produces no duplicates.
Every extracted entry carries metadata: which project, which git branch, which conversation thread (the "slug" that persists across session continuations), timestamps, the working directory, whether it's a sidechain branch. Tool invocations carry the tool name and parameters. Thinking blocks carry cryptographic signatures that the Anthropic API requires for replay.
What 300,000 Entries Look Like
Here's the breakdown of 308,866 entries extracted over 105 days:
Corpus Composition
| Feature | Entries | Share |
|---|---|---|
| Tool invocations (tool_use + tool_result) | 195,914 | 63% |
| Assistant responses | 49,407 | 16% |
| Chain-of-thought reasoning | 47,982 | 16% |
| User messages | 12,645 | 4% |
| Screenshots | 550 | <1% |
| Summaries, metadata | ~2,400 | <1% |
The first surprise: tool calls dominate. Nearly two-thirds of the corpus is Claude reading files, writing edits, running bash commands, and receiving results. This turns out to be the most informationally dense part. An Edit tool call contains the exact diff — what changed, in which file, with surrounding context. A Read call reveals which files the assistant examined and in what order. Bash commands capture build outputs, test results, deployment steps.
Higher Entity Density
57%
Tool calls vs. dialogue
The second surprise: the tool entries have 57% higher entity density than dialogue. More named things — libraries, functions, file paths, API endpoints — appear per token in tool calls than in natural language exchanges. When you're searching for "how did we implement the OAuth flow," the tool calls that wrote the code are more useful than the conversation about writing it.
Two Search Backends
The corpus lives in two places simultaneously. The JSONL file is the source of truth — human-readable, append-only, trivially backupable. PostgreSQL with pgvector is a derived index rebuilt from the JSONL at any time.
Text search uses PostgreSQL's tsvector with BM25 ranking. An inline query parser handles dimension filters (project:crx2 tag:raw:assistant @react "exact phrase" -exclude), entity filters, time windows, and free text — all in a single search bar.
Semantic search uses OpenAI's text-embedding-3-large (3072 dimensions) via the Batch API at $0.065 per million tokens. 221,000 entries are embedded. HNSW indexes use a halfvec cast because pgvector's HNSW has a 2,000-dimension limit — casting to half precision at 3,072 dimensions stays under the wire and cuts index size in half.
Entries Embedded
221,000
text-embedding-3-large, 3072 dimensions
Hybrid search merges both: BM25 finds exact matches, embeddings find semantic neighbors, reciprocal rank fusion combines the results. A triangulation mode takes a set of entry embeddings, computes their weighted centroid, and finds entries near that centroid — discovering related content that shares no keywords with the original query.
For foundational context on hybrid retrieval patterns (union vs. intersection, BM25 rank normalization, weighted merge), see How OpenClaw Implements Agent Memory.
Entity Extraction
A spaCy pipeline with a custom EntityRuler identifies 22 entity types across the corpus: libraries, frameworks, file paths, functions, APIs, services, people, error messages. 1.75 million entity links connect 144,000 distinct entities to the entries where they appear.
Distinct Entities
144,000
22 types
Entity Links
1.75M
Connecting entities to entries
Search Latency
15ms
Hybrid search, p50
Entity momentum tracking measures velocity — how quickly an entity's mention frequency is changing. Rising entities surface what you're actively working with. Fading entities reveal what you've moved away from. Z-score anomaly detection flags days where an entity's activity deviates significantly from its historical baseline.
On the entity profile page, you can see every project and thread where an entity appears, its co-occurring entities, its momentum trend, and a semantic "evolution" view showing how the context around that entity has shifted over time.
Conversations as Threads
A conversation thread in Claude Code can span multiple session files. When context runs out and the user continues, a new session file starts — but the slug (a human-readable identifier like glistening-discovering-bengio) stays the same. The corpus currently contains 393 threads, some spanning 30+ session continuations.
Thread-level analysis reveals patterns invisible at the entry level. Pattern classifiers tag entries as code writing, error recovery, architecture decisions, dangerous operations, deep research, or deployment — 11 categories total. A thread's pattern distribution tells you its character: a thread that's 60% code writing and 15% error recovery is a feature build; one that's 40% deep research and 30% architecture decisions is a design exploration.
Conversations have topology. A conversation isn't a flat list — it's a tree with branches (sidechains), continuations (session files sharing a slug), and compaction boundaries (summaries). Reconstructing the tree from the flat log requires understanding all three.
Fleet Harvesting
The system isn't limited to one machine. Three remote agents (Shellder, Misty, Geodude) run Claude Code for automated tasks — cron jobs, monitoring, batch processing. An hourly harvester SSHs to each agent, downloads new session files, converts them to the corpus format, and appends them. Same extraction pipeline, same deduplication, same search index. The fleet dashboard shows each agent's activity, patterns, and entities in real time.
The Assembly Pipeline
Search finds entries. Assembly turns them into working memory.
assemble.py takes a natural language description — "Chrome extension auth flow, OAuth tokens, session management" — and builds a curated briefing from the corpus. No LLM in the loop; the search functions are the intelligence.
The pipeline runs in five passes:
The output can be a curated thread in the database — or a Claude Code session file.
Session Export: Authentic Memory
This is the piece that makes the system more than an archive.
assemble.py --export writes the assembled entries as a valid Claude Code session JSONL file. Real user messages, real assistant responses, real thinking blocks with their original cryptographic signatures, proper parentUuid chaining for conversation tree integrity. The file goes into ~/.claude/projects/, and you load it with claude --resume <session-id>.
Working memory beats archives. The difference between "I can search my old conversations" and "I can load my old conversations as authentic context" is the difference between a reference library and actual memory. Session export is the feature that transforms the system from useful to essential.
The agent doesn't get a summary. It gets the actual conversation history — the original prompts, the original reasoning, the original code changes. It can inspect the thinking that led to a decision. It can see the error messages that were encountered and how they were resolved. It picks up mid-thought, with authentic context, as if it had been there.
28,000 of 48,000 thinking entries have their cryptographic signatures preserved. For entries without signatures, the thinking content folds into a text block as a fallback — the information is still there, just not in the privileged thinking format.
What the UI Reveals
The web interface (Flask on port 8484) consolidates five views:
Search — the probe. Text, semantic, or hybrid search with inline dimension filters. Keyboard-navigable results with j/k, entity autocomplete, triangulation.
Entities — the map. Browse 144,000 named entities, see what's trending, what's fading, what triggered anomalous activity on a given day. Click through to full entity profiles with momentum charts and evolution timelines.
Threads — the timeline. 393 conversations with their plans, gallery screenshots, pattern distributions, and scope analysis (which files were touched, which directories, which branches).
Activity — the pulse. A GitHub-style contribution heatmap across 365 days. Daily chronicles showing which threads were active, which patterns fired, which entities appeared. Fleet agent dashboards. Corpus-wide statistics with filterable charts.
Assemble — the compositor. Natural language in, curated session out. Dry-run mode shows the search strategies before executing. Budget allocation visible. Export to loadable session with one click.
The Security Question
The corpus contains everything said in every coding session — API keys pasted into prompts, business logic discussed in thinking blocks, debugging sessions with production data. Sensitivity is high.
All directories are owner-only (chmod 700/600). Spotlight indexing is disabled. Nothing syncs to iCloud. The watcher makes zero network calls — pure local I/O. The only non-local data flow is Claude Code itself sending conversations to Anthropic's API, which exists regardless of the extraction system.
The one deliberate external call is to OpenAI's Batch API for embeddings — a conscious trade-off for embedding quality at $0.065 per million tokens. The embedding vectors are stored locally. The source text never leaves the machine except through this channel.
What I've Learned
Tool calls are the corpus. At 63% of entries with higher entity density than dialogue, the tool invocations are the primary record of what actually happened. Embedding them was the single most impactful decision for search quality.
Conversations have topology. A conversation isn't a flat list — it's a tree with branches (sidechains), continuations (session files sharing a slug), and compaction boundaries (summaries). Reconstructing the tree from the flat log requires understanding all three.
Working memory beats archives. The difference between "I can search my old conversations" and "I can load my old conversations as authentic context" is the difference between a reference library and actual memory. Session export is the feature that transforms the system from useful to essential.
Append-only is the right primitive. The source files are append-only. The corpus is append-only. The entity links are append-only. This makes everything idempotent, recoverable, and simple to reason about. The derived indexes (PostgreSQL, embeddings) can be rebuilt from the append-only source at any time.
Entities are the skeleton. Free-text search finds what you remember asking about. Entity search finds what you were actually working with. The gap between those two is where the most valuable rediscoveries happen.
Claw-memory runs locally on macOS. PostgreSQL 18 with pgvector in Docker. Python with Flask, spaCy, and psycopg2. No cloud services except OpenAI embeddings. 308,866 entries, 393 threads, 1.75 million entity links, 221,000 embeddings, 521 screenshots, 1,575 session files. 105 days of conversation, searchable in 15ms.
See also: Fleet Manager & Claw Launching Tools for the companion fleet management system, How OpenClaw Implements Agent Memory for the code-level walkthrough of hybrid search, and Agent Memory: From Stateless to Stateful AI for the conceptual foundations.
Fleet Manager & Claw Launching Tools
A dashboard for real-time agent visibility and a deploy pipeline for code, config, identity, and capabilities across three Hetzner servers over Tailscale SSH.
How OpenClaw Implements Agent Memory
A code-level walkthrough of hybrid search, pre-compaction flush, and the design decisions behind a production agent memory system.
Agent Memory: From Stateless to Stateful AI
LLMs are stateless by design. Agents require state. The memory architectures—context management, vector stores, knowledge graphs—that transform amnesiacs into collaborators.