OpenClaw Intelligence Layer: How the Agent Thinks

Part 2 of a technical deep-dive into OpenClaw. Part 1 covered infrastructure—how messages flow. This covers intelligence—how the agent reasons, remembers, and responds.

From Routing to Reasoning

Part 1 ended at the gateway: a message arrives, gets normalized, routes to an agent. But what happens inside that agent? How does it maintain context across conversations? How does it know what it can do? How does it manage the finite token budget of an LLM context window?

Here's the core insight, stated upfront: OpenClaw treats the system prompt as compiled output, not configuration. The prompt emerges from runtime context—available tools, channel capabilities, identity files, skills. Change the inputs, the prompt changes. We'll return to this at the end.

OpenClaw's intelligence layer solves these problems through six interlocking systems:

Session Keys — Addresses for conversation state
Context Pruning — Surgical compression of tool results
The Prompt Compiler — Modular system prompt assembly
Hybrid Memory — Vector + keyword search for long-term recall
Skills Discovery — Dynamic capability registration
Thinking Modes — Reasoning budget control

Each is orthogonal. Together, they give the agent memory, personality, and awareness of its own capabilities.

1. Session Keys: Addresses, Not Identifiers

A session key in OpenClaw isn't an opaque ID—it's a structured address that encodes isolation semantics.

Gary Bernhardt's Boundaries talk describes how functional core / imperative shell architectures use data structures to encode decisions. Session keys work the same way: the key's structure is the isolation policy, not a lookup to find one.

The grammar is agent:{id}:{scope}:{peer}. Parsing the key tells you the isolation semantics without additional lookups.

Four isolation levels emerge:

Scope	Session Key	Behavior
`main`	`agent:main:main`	All DMs share context
`per-peer`	`agent:main:dm:alice`	Each person gets isolated context
`per-channel-peer`	`agent:main:whatsapp:dm:alice`	Same person, different channels = different context
`per-account-channel-peer`	`agent:main:whatsapp:work:dm:alice`	Account + channel + peer isolation

Cross-channel identity linking (identityLinks in config) maps a single person across platforms. When Alice messages from Telegram, she resolves to the canonical identity alice, and her session key becomes agent:main:dm:alice—the same key used for WhatsApp and Signal.

2. Context Pruning: The Scalpel, Not the Axe

LLM context windows are finite. Long conversations accumulate tool results—file contents, command outputs, search results—that consume tokens without remaining relevant.

OpenClaw doesn't truncate. It prunes. And the distinction matters.

Truncation is amputation: chop off the oldest content, regardless of what's lost. Pruning is surgical: identify what's expendable, preserve what remains semantically relevant, and leave a suture noting what was removed.

From src/agents/pi-extensions/context-pruning/settings.ts:

export const DEFAULT_CONTEXT_PRUNING_SETTINGS = {
  keepLastAssistants: 3,          // Protect last 3 assistant turns
  softTrimRatio: 0.3,             // Start trimming at 30% capacity
  hardClearRatio: 0.5,            // Replace entirely at 50% capacity
  softTrim: {
    maxChars: 4_000,
    headChars: 1_500,
    tailChars: 1_500,
  },
  hardClear: {
    enabled: true,
    placeholder: "[Old tool result content cleared]",
  },
};

Two-Phase Compression

The algorithm in src/agents/pi-extensions/context-pruning/pruner.ts operates in two phases.

Phase 1: Soft Trim — Keep semantic bookends.

A 50KB file read becomes 3KB: first 1500 characters, last 1500 characters, with a note explaining the compression. Rich Hickey talks about Simple Made Easy—the distinction between simple (not complex) and easy (close at hand). Soft trimming keeps what's simple to reason about: the beginning (setup, imports, definitions) and the end (conclusions, outputs, final state).

const trimmed = `\${head}
...
\${tail}`;
 
const note = `\n\n[Tool result trimmed: kept first \${headChars} chars and last \${tailChars} chars of \${rawLen} chars.]`;
 
return { ...msg, content: [{ type: "text", text: trimmed + note }] };

Phase 2: Hard Clear — Replace entirely when still over budget.

if (ratio >= settings.hardClearRatio && settings.hardClear.enabled) {
  const cleared = {
    ...msg,
    content: [{ type: "text", text: settings.hardClear.placeholder }],
  };
  next[i] = cleared;
}

Old tool results become [Old tool result content cleared]. The agent knows it ran the tool; it just can't see the output anymore. The procedure happened; only the notes are missing.

The Protected Zone

The scalpel never touches:

The last N assistant turns (keepLastAssistants: 3)
Anything before the first user message (protects bootstrap identity files)
Image content (hard to partially compress meaningfully)
Tools not in the prunable list

And crucially: pruning operates on in-memory context only. The full session transcript remains on disk, intact. The agent sees a compressed view; the historical record is preserved.

3. The Prompt Compiler

The system prompt in OpenClaw isn't a static string. It's assembled at runtime from context.

From src/agents/system-prompt.ts:

export type PromptMode = "full" | "minimal" | "none";
 
export function buildAgentSystemPrompt(params: {
  workspaceDir: string;
  defaultThinkLevel?: ThinkLevel;
  extraSystemPrompt?: string;
  toolNames?: string[];
  skillsPrompt?: string;
  contextFiles?: EmbeddedContextFile[];
  sandboxInfo?: { enabled: boolean; /* ... */ };
  runtimeInfo?: { agentId?: string; host?: string; model?: string; /* ... */ };
  promptMode?: PromptMode;
  // ... 20+ more parameters
}) {
  // Section assembly
}

Modular Section Assembly

Each section is built by a dedicated function and conditionally included:

const lines = [
  "You are a personal assistant running inside OpenClaw.",
  "",
  "## Tooling",
  toolLines.join("\n"),
  "",
  ...buildSkillsSection({ skillsPrompt, isMinimal, readToolName }),
  ...buildMemorySection({ isMinimal, availableTools }),
  ...buildDocsSection({ docsPath, isMinimal, readToolName }),
  ...buildUserIdentitySection(ownerLine, isMinimal),
];

A subagent gets promptMode: "minimal"—no skills section, no docs section, no heartbeat instructions. Fewer tokens, faster responses, same core behavior.

Personality Injection

When a SOUL.md file exists in the workspace, the compiler detects it and adds guidance:

const hasSoulFile = contextFiles.some((file) => {
  const baseName = file.path.split("/").pop() ?? file.path;
  return baseName.toLowerCase() === "soul.md";
});
 
if (hasSoulFile) {
  lines.push(
    "If SOUL.md is present, embody its persona and tone. " +
    "Avoid stiff, generic replies; follow its guidance."
  );
}

The identity file itself is included in context. A sample SOUL.md:

I'm a helpful coding assistant with a dry sense of humor.
I prefer concise responses and avoid unnecessary verbosity.
When I don't know something, I say so directly.

The agent absorbs this as personality, not instruction. For the full identity system—soul files, identity resolution, multi-agent personas—see How OpenClaw Gives Agents Identity.

Runtime Metadata

The prompt includes a compact runtime line:

Runtime: agent=main | host=macbook.local | model=claude-3-5-sonnet | channel=telegram | capabilities=reactions,inlineButtons | thinking=low

The agent knows what model it is, where it's running, and what the current channel supports—without hard-coding any of it.

4. Hybrid Memory: Vector + Keyword

Beyond conversation history, OpenClaw maintains semantic memory—markdown files, session transcripts, and knowledge bases indexed for search. For foundational context on memory taxonomies, see Agent Memory Architecture. For implementation details—hybrid search, pre-compaction flush, embedding providers—see How OpenClaw Implements Agent Memory.

From src/memory/manager.ts:

const VECTOR_TABLE = "chunks_vec";
const FTS_TABLE = "chunks_fts";
 
export type MemorySearchResult = {
  path: string;
  startLine: number;
  endLine: number;
  score: number;
  snippet: string;
  source: "memory" | "sessions";
};

Two Search Strategies, Merged

Vector search finds semantically similar content. Keyword search (BM25) finds exact matches. Neither is sufficient alone. (For more on hybrid retrieval patterns, see RAG Reality Check.)

From src/memory/hybrid.ts:

export function mergeHybridResults(params: {
  vector: HybridVectorResult[];
  keyword: HybridKeywordResult[];
  vectorWeight: number;
  textWeight: number;
}) {
  const byId = new Map();
 
  // Collect vector results
  for (const r of params.vector) {
    byId.set(r.id, { ...r, vectorScore: r.vectorScore, textScore: 0 });
  }
 
  // Merge keyword results
  for (const r of params.keyword) {
    const existing = byId.get(r.id);
    if (existing) {
      existing.textScore = r.textScore;
    } else {
      byId.set(r.id, { ...r, vectorScore: 0, textScore: r.textScore });
    }
  }
 
  // Combine scores with configurable weights
  return Array.from(byId.values())
    .map(entry => ({
      ...entry,
      score: params.vectorWeight * entry.vectorScore +
             params.textWeight * entry.textScore,
    }))
    .toSorted((a, b) => b.score - a.score);
}

A query like "that config change we discussed last Tuesday" benefits from both: "config change" matches keywords; "discussed last Tuesday" needs semantic similarity to surface the right session transcript.

Storage: SQLite + sqlite-vec

Embeddings are stored using the sqlite-vec extension. SQLite for metadata and FTS5 keyword search. sqlite-vec for vector similarity. One database file, two search strategies, merged results.

The system prompt instructs memory-aware behavior:

function buildMemorySection(params: { isMinimal: boolean; availableTools: Set<string> }) {
  if (params.isMinimal) return [];
  if (!params.availableTools.has("memory_search")) return [];
 
  return [
    "## Memory Recall",
    "Before answering anything about prior work, decisions, dates, " +
    "preferences, or todos: run memory_search on MEMORY.md + memory/*.md; " +
    "then use memory_get to pull only the needed lines.",
  ];
}

The agent is taught when to search its memory—not on every turn, but when the question implies historical context.

5. Skills: Dynamic Capability Discovery

Skills are modular capabilities the agent can invoke. They're discovered at startup and presented in the system prompt. This maps directly to MCP's three primitives—see The Protocol That Won.

From src/agents/skills/workspace.ts:

function loadSkillEntries(workspaceDir: string, opts?: { config?: Config }) {
  const managedSkillsDir = path.join(CONFIG_DIR, "skills");
  const workspaceSkillsDir = path.join(workspaceDir, "skills");
  const bundledSkillsDir = resolveBundledSkillsDir();
 
  // Load from multiple sources
  const bundledSkills = loadSkills({ dir: bundledSkillsDir, source: "bundled" });
  const managedSkills = loadSkills({ dir: managedSkillsDir, source: "managed" });
  const workspaceSkills = loadSkills({ dir: workspaceSkillsDir, source: "workspace" });
 
  // Priority: workspace > managed > bundled
}

The Skills Prompt

Skills are presented as an XML block with descriptions and locations:

<available_skills>
  <skill name="git-commit" location="skills/git-commit/SKILL.md">
    <description>Create well-structured git commits with conventional format</description>
  </skill>
  <skill name="pr-review" location="skills/pr-review/SKILL.md">
    <description>Review pull requests for code quality and correctness</description>
  </skill>
</available_skills>

The system prompt teaches selection:

function buildSkillsSection(params: { skillsPrompt?: string; isMinimal: boolean }) {
  if (params.isMinimal) return [];
 
  return [
    "## Skills (mandatory)",
    "Before replying: scan <available_skills> <description> entries.",
    "- If exactly one skill clearly applies: read its SKILL.md, then follow it.",
    "- If multiple could apply: choose the most specific one.",
    "- If none clearly apply: do not read any SKILL.md.",
    "Constraints: never read more than one skill up front.",
    params.skillsPrompt,
  ];
}

The "Read One, Follow It" Pattern

The agent doesn't load all skill files upfront—that would waste context. Instead:

Scan descriptions (small, already in prompt)
Select the most relevant skill
Read that skill's SKILL.md (only when needed)
Follow its instructions

This lazy-loading pattern keeps the base prompt small while supporting unlimited skill expansion. It's Kelsey Hightower's approach to infrastructure—small composable pieces, discovered and assembled at runtime—applied to agent capabilities.

6. Thinking Modes: Reasoning Budgets

Not every question deserves chain-of-thought. A quick status check doesn't need structured reasoning. A complex debugging session does.

From src/auto-reply/thinking.ts:

export type ThinkLevel = "off" | "minimal" | "low" | "medium" | "high" | "xhigh";
export type ReasoningLevel = "off" | "on" | "stream";

Level Normalization

User input maps to canonical levels:

export function normalizeThinkLevel(raw?: string | null): ThinkLevel | undefined {
  const key = raw?.toLowerCase();
 
  if (["off"].includes(key)) return "off";
  if (["on", "enable"].includes(key)) return "low";
  if (["min", "minimal"].includes(key)) return "minimal";
  if (["mid", "medium", "harder"].includes(key)) return "medium";
  if (["high", "ultra", "max"].includes(key)) return "high";
  if (["xhigh", "x-high"].includes(key)) return "xhigh";
 
  return undefined;
}

Some models support extended reasoning modes. When available, xhigh unlocks deeper chain-of-thought capabilities.

Reasoning Tag Format

When reasoning is enabled, the prompt instructs structured output:

const reasoningHint = params.reasoningTagHint
  ? [
      "ALL internal reasoning MUST be inside <think>...</think>.",
      "Format every reply as <think>...</think> then <final>...</final>.",
      "Only text inside <final> is shown to the user.",
    ].join(" ")
  : undefined;

The agent's chain-of-thought becomes inspectable (in stream mode) or hidden (in on mode), but always structured.

The Complete Picture

The intelligence layer sits on top of the infrastructure layer:

Layer	Component	Purpose
Intelligence	Session Keys	Conversation isolation and continuity
	Context Pruning	Fit infinite conversations in finite windows
	Prompt Compiler	Assemble context-aware system prompts
	Hybrid Memory	Long-term recall via search
	Skills	Dynamic capability discovery
	Thinking Modes	Reasoning budget control
Infrastructure	Lanes	Starvation-free concurrency
	Channels	Protocol normalization
	Routing	Message-to-agent resolution
	Gateway	Control plane and RPC
	Approval	Human-in-the-loop gating

Infrastructure handles: "Where does this message go?"

Intelligence handles: "What does the agent know, remember, and understand?"

The Design Insight (Expanded)

Most AI assistants treat the prompt as configuration—a string you write once and deploy.

OpenClaw treats the prompt as compiled output.

The inputs are:

Available tools (filtered by policy)
Channel capabilities (what can this surface do?)
Sandbox state (is execution restricted?)
Identity files (who is this agent?)
Skills (what can it learn on demand?)
Runtime metadata (where is it running?)

The prompt emerges from these inputs. Change the inputs, the prompt changes.

This inverts the typical workflow: instead of crafting prompts, you configure systems and the prompts follow. The same inversion applies to memory (let the agent search its own history), skills (let it discover them at runtime), and context (let the pruner manage it dynamically).

The agent becomes less a thing you configure and more a thing that configures itself from its environment.

This is the deeper pattern. The intelligence layer doesn't make the agent smart—it gives the agent the tools to be situationally appropriate. The right context, the right capabilities, the right amount of reasoning. All derived from structure, not specified in advance.

Repository: github.com/openclaw/openclaw

Part 1: Building Personal AI Infrastructure

The Intelligence Layer: How OpenClaw Thinks