MMNTM logo
Technical Deep Dive

Fleet Manager & Claw Launching Tools

A dashboard for real-time agent visibility and a deploy pipeline for code, config, identity, and capabilities across three Hetzner servers over Tailscale SSH.

Greg Salwitz
12 min read
#OpenClaw#Fleet Management#Agent Deployment#Infrastructure#Local-First

A Next.js dashboard and 2,100 lines of bash that answer two questions: what are my agents doing, and how do I change what they're doing.


The fleet management layer is split into two halves: a dashboard (Next.js app, 25,000 lines across 136 files) that provides real-time visibility into every running agent, and a deploy pipeline (2,100+ lines of bash across 5 scripts) that handles code, config, workspace, and identity deployment to 3 Hetzner servers over Tailscale SSH.

Lines of TypeScript

25,380

Dashboard (136 files)

Lines of Bash

1,643

Deploy pipeline (5 scripts)

Infrastructure

$30/mo

3 Hetzner VPS instances


The Dashboard

Architecture

It's a Next.js 15 app with App Router. Every page lives under src/app/ — 27 route segments covering fleet overview, per-agent views, sessions, cron management, memory health, models, approvals, config editing, usage/cost analytics, topology visualization, a chat interface, log viewer, and a "souls" system for agent personality management.

State management is Zustand — a single fleet-store.ts that holds the canonical state for every connected agent. Each agent's state tracks id, name, URL, connection status, health snapshots, model, sessions, usage, cost, cron jobs, channel connections, presence, and logs.

The store also maintains pre-computed derived arrays — allSessions, allCronJobs, clawList — so components never need to flatten the per-agent state themselves. There's an event stream buffer with pause/resume and type/agent filtering, and an RPC log that tracks every request to every agent with timing data.

Connection status tracks five states: disconnected, connecting, connected, reconnecting, unreachable. After 20 consecutive reconnection failures, the client gives up and marks the agent as unreachable.

The Gateway Client

The key piece is GatewayClient in src/lib/gateway-client.ts. Each agent connection is a persistent WebSocket to the agent's OpenClaw gateway via Tailscale. The protocol:

Singleton client instances survive HMR and re-renders via a module-level Map keyed by agent ID. The FleetConnector component renders nothing — it's a mount-once side-effect that initializes all connections.

What You Can Do

The dashboard exposes 30+ RPC methods through the gateway protocol.

RPC Method Categories

FeatureKey MethodsPurpose
Observationhealth, sessions.list, sessions.usage, cost.summary, channels.status, presence, logsContainer uptime, memory, CPU, token consumption, channel connections, structured logs
Controlcron.add/update/remove/run, config.get/set, models.list/scan, skills.install/removeFull cron lifecycle, live config editing, model switching, skill management
Interactionagent.chat, tools.listSend messages to agents programmatically, inspect available tools
Memorymemory.search, memory.store, memory.stats, memory.changelogSemantic search, manual injection, capture/recall history

Key Components

58 total components. The heavyweights:

  • fleet-connector.tsx — Zero-render connection manager. Fetches /api/claws for the agent roster, creates GatewayClient per agent, wires events into the Zustand store, handles visibility-based reconnection.
  • fleet-report.tsx — Aggregated fleet summary: total sessions, token usage, cost across all agents, model distribution.
  • event-stream.tsx — Real-time event firehose from all agents, with pause/resume, type filtering, agent filtering. Every gateway event (session start, tool call, message, approval request) appears here.
  • approval-queue.tsx — When an agent wants to run a dangerous command, the approval lands here. Approve or deny from the dashboard.
  • memory-health.tsx — D3 sparkline chart of 7-day memory activity (captures vs recalls) aggregated across the fleet.
  • fleet-topology.tsx — Network visualization of agent-to-server-to-channel connections.
  • cron-table.tsx — Manage cron jobs across the fleet — add, edit, delete, trigger manual runs, see last execution status.
  • fleet-viz-lcars.tsx — Star Trek LCARS-inspired fleet status display.
  • agent-compare.tsx — Side-by-side comparison of two agents' config, sessions, memory, and performance.

The FleetConnector renders nothing. It's a mount-once side-effect that initializes all WebSocket connections and wires events into the Zustand store. Components just read state — the connection lifecycle is invisible to them.


The Deploy Pipeline

Five scripts, 2,100+ lines, all bash 3.2-compatible (no associative arrays — runs on stock macOS). Everything goes over Tailscale SSH to three Hetzner servers.

Deploy Scripts

FeatureLinesPurpose
deploy.sh643Code deploys, config patches, workspace pushes, rollbacks
agent-update.sh487Agent-level updates: workspace, skills, capabilities, env vars, single files
check.sh368Infrastructure drift detection against infra.yaml manifests
watchdog.sh145Continuous health monitoring with alert thresholds
deploy-capability.sh~50Shortcut for deploying a capability (cron + context) to an agent

deploy.sh — The Core

Four modes, one script:

Code deploy (./fleet/deploy.sh shellder):

Config patch (./fleet/deploy.sh shellder --config fleet/patches/enable-linear.json): Pre-flight validates \${VAR} references exist in server .env, scans for plaintext secrets. Backs up current openclaw.json, applies patch via Python3 on the server (streamed via stdin), validates resulting JSON, restarts container, auto-rollbacks on health check failure.

The patch system supports three operations: set (default — null values delete the key), append (idempotent array append), and per_target (agent-specific overrides applied after base changes).

Workspace deploy (./fleet/deploy.sh shellder --workspace): Tar the local fleet/[agent]/workspace/ directory, SCP to server, docker cp into container, extract, fix ownership.

Rollback (./fleet/deploy.sh shellder --rollback): Restore openclaw.json.bak, restart, health check.

Every mode supports --dry-run and sequential multi-agent deploy (./fleet/deploy.sh all) that stops on first failure — the canary pattern. Shellder goes first, Misty second, Geodude last.

Multi-agent deploys are sequential canary deploys. Shellder goes first. If healthy, Misty. If healthy, Geodude. Any failure stops the chain and prints logs. No silent failures.

check.sh — Drift Detection

Each agent has an infra.yaml manifest — the declared desired state:

agent: shellder
host: 100.99.28.2
domain: shellders.com
container: openclaw-src-openclaw-gateway-1
 
ports:
  - internal: 18789
    bind: "127.0.0.1"
    purpose: gateway
  - internal: 8788
    bind: "127.0.0.1"
    purpose: gmail-pubsub
 
tailscale:
  serve:
    - port: 8443
      path: "/"
      proxy: 18789
      scope: tailnet
  funnel:
    - port: 443
      path: "/gmail-pubsub"
      proxy: 8788
 
firewall:
  allow:
    - port: 80
      proto: tcp
      from: any
    - port: 22
      from: "100.64.0.0/10"
      comment: "Tailscale subnet"
  deny:
    - port: 3337
      reason: "webhook traffic routes through Caddy"

check.sh SSHes into each server and compares actual state against this manifest. Seven checks per agent: container health, port bindings (Docker ports bound to 127.0.0.1 not 0.0.0.0), Tailscale serve, Tailscale funnel, Caddy config, UFW rules, and DNS resolution. Output is color-coded: green OK, red FAIL, yellow WARN. Exit code 1 if any drift detected.

The Config Patch Library

Ten ready-to-deploy patches in the repo:

Config Patches

FeatureWhat It Does
enable-linear.jsonLinear extension + per-agent labels + alsoAllow append
enable-loop-detection.jsonTool loop detection thresholds
enable-android-sms.jsonAndroid SMS gateway (requires env vars)
enable-tailscale-auth.jsonTailscale-based auth for gateway
disable-twilio-sms.jsonKill broken Twilio plugin (Shellder + Misty only)
clean-geodude-dead-config.jsonRemove dead 1password config from Geodude

Each patch is a versioned, reviewable, git-tracked artifact. No SSHing into servers and hand-editing JSON.


The Identity System

How Agent Personalities Work

Every agent has a set of workspace files that define who they are:

Identity Files

FeaturePurpose
SOUL.mdCore personality — voice, values, behavioral patterns
IDENTITY.mdPresentation — name, role, avatar, pronouns
VOICE.mdCalibration samples — write like this, not like that
USER.mdRelationship with the owner — what they know about Greg
AGENTS.mdFleet awareness — who the other agents are, how to coordinate
TOOLS.mdOperational knowledge — available tools, API references
OPERATIONS.mdSOPs — how to handle incidents, what to check on heartbeat
CENTERING.mdGrounding context — pulled at session start for focus
STATE.mdCurrent situation — what is happening this week

These live in fleet/[agent]/workspace/ in the repo and get deployed via agent-update.sh --workspace. The agent reads them at session start — they're injected into the system prompt based on session type (some files are cron-only, some are always-on).

Six Agents, Six Identities

Agents Defined

6

Across 3 servers

Marginal Cost per Agent

~$0

Same server, same model API

  • Shellder — Sharp, direct, dry humor. General ops and creative work.
  • Sir Claw (same server, port 18790) — Architecture focus. Separate container, same box.
  • Near Mint Misty — Warm, enthusiastic, Pokémon card collector energy. Fleet PM.
  • Maverick (same server as Misty) — Content creator, chaos-poet.
  • Mint Geodude — Blunt, deadpan, Ron Swanson meets Mike Ehrmantraut. Security.
  • Umbreon — Soul file only, not yet deployed.

The multi-agent-per-server pattern is proven: add a compose service, map a new port, deploy identity files, done.

The dashboard has a /souls page backed by /api/souls that provides CRUD for agent personality files. Edit the soul in the dashboard, deploy via agent-update.sh --workspace, agent picks it up on next session start.


A Day in Fleet Operations

The whole pipeline is designed for a single operator managing a fleet of AI agents from a laptop over Tailscale. No CI/CD server, no Kubernetes, no cloud orchestrator. Just SSH, bash, and a Next.js dashboard.

Total infrastructure: ~1,643 lines of bash for the deploy pipeline, 25,380 lines of TypeScript for the dashboard, ~70 lines of YAML per agent for infrastructure manifests. Running cost: $30/month for three Hetzner VPS instances.


See also: Building a Memory System for AI Conversations for the companion claw-memory system, How OpenClaw Implements Agent Memory for the code-level walkthrough, and The Intelligence Layer: How OpenClaw Thinks for the broader agent architecture.

Greg SalwitzFeb 24, 2026