Systems of Agents: Building the Next Trillion-Dollar Platforms

The Debate

Jamin Ball's recent post "Long Live Systems of Record" hit a nerve. Pushing back on the "agents kill everything" narrative, he argues that agents don't replace systems of record—they raise the bar for what a good one looks like.

He's right. Agents are cross-system and action-oriented. The UX of work is separating from the underlying data plane. Agents become the interface—what we call beyond chat interfaces—but something still has to be canonical underneath.

Where we go further: Ball's framing assumes the data agents need already lives somewhere, and agents just need better access. That's half the picture. The production gap isn't about access. It's about context that was never captured in the first place.

The other half is the missing layer that actually runs enterprises: decision traces—the exceptions, overrides, precedents, and cross-system context that currently live in Slack threads, deal desk conversations, escalation calls, and people's heads.

The Distinction That Matters: Rules tell an agent what should happen in general. Decision traces capture what actually happened in this specific case—who approved it, under which policy version, with what exceptions, based on which precedent.

Agents don't just need rules. They need access to how rules were applied in the past, where exceptions were granted, how conflicts were resolved, and which precedents actually govern reality.

What Systems of Record Don't Capture

Agents are shipping into real workflows—contract review, quote-to-cash, support resolution. Teams are hitting a wall that governance alone can't solve.

The wall isn't missing data. It's missing decision traces.

Exception logic that lives in people's heads. "We always give healthcare companies an extra 10% because their procurement cycles are brutal." That's not in the CRM. It's tribal knowledge passed down through onboarding.

Precedent from past decisions. "We structured a similar deal for Company X last quarter—we should be consistent." No system links those two deals or records why the structure was chosen.

Cross-system synthesis. The support lead checks customer tier in Salesforce, sees two open escalations in Zendesk, reads a Slack thread flagging churn risk, and decides to escalate. That synthesis happens in their head. The ticket just says "escalated to Tier 3."

Approval chains outside systems. A VP approves a discount on a Zoom call. The opportunity record shows the final price. It doesn't show who approved the deviation or why.

This is what "never captured" means. Not that data is dirty or siloed, but that the reasoning connecting data to action was never treated as data in the first place. It's the context crisis at the organizational level.

Read Path vs Write Path: Why Incumbents Can't Build This

Ball is optimistic that existing players evolve into decision-aware architecture. Warehouses become "truth registries." CRMs become "state machines with APIs."

That might work for making existing data more accessible. It doesn't work for capturing decision traces.

The Operational Incumbent Problem

Salesforce is pushing Agentforce. ServiceNow has Now Assist. Workday is building HR agents. The pitch: "We have the data, now we add the intelligence."

But these agents inherit their parent's architectural limitations. Salesforce is built on current state storage. It knows what the opportunity looks like now, not what it looked like when the decision was made. When a discount gets approved, the context that justified it isn't preserved. You can't replay the state of the world at decision time.

They also inherit their parent's blind spots. A support escalation doesn't live in Zendesk alone. It depends on customer tier from the CRM, SLA terms from billing, recent outages from PagerDuty, and a Slack thread flagging churn risk. This is the MCP challenge in reverse: even with perfect tool integration, no incumbent sits in the cross-system path where decisions actually happen.

The Warehouse Problem

Snowflake and Databricks are positioned as the "truth registry" layer. Both are leaning in—Snowflake with Cortex, Databricks with Lakebase and AgentBricks.

Warehouses do have a time-based view. You can query historical snapshots. But warehouses are in the read path, not the write path.

When warehouses see data

After the fact

Via ETL, after decisions are made

Data arrives via ETL after decisions are made. By the time data lands in Snowflake, the decision context is gone. A system that only sees reads, after the fact, can't be the system of record for decision lineage. It can tell you what happened. It can't tell you why.

The Structural Advantage

Systems-of-agents startups have a different position: they're in the orchestration path.

When an agent triages an escalation, responds to an incident, or decides on a discount, it pulls context from multiple systems, evaluates rules, resolves conflicts, and acts. The orchestration layer sees the full picture at decision time—not after the fact via ETL, but in the moment, as a first-class record.

That's the context graph. And that will be the single most valuable asset for companies in the era of AI.

Three Paths for Founders

Different startups will take different approaches. Each has distinct trade-offs.

Path 1: Replace the System of Record

Build a CRM or ERP from day one around agentic execution—event-sourced state, policy capture native to the architecture. This is the vertical agents winning thesis applied to infrastructure.

Example: Regie. Of the many startups going after AI SDR, Regie chose to build an AI-native sales engagement platform to replace legacy platforms like Outreach/Salesloft. Those were designed for humans executing sequences across fragmented tools. Regie is designed for a mixed team where the agent is a first-class actor: it prospects, generates outreach, runs follow-ups, handles routing, and escalates to humans.

Trade-off: Hard. Incumbents are entrenched. This becomes viable at transition moments—new categories, regulatory shifts, platform changes. The why small wins thesis applies: start with the companies that have less to lose.

Path 2: Replace Modules, Not Systems

Target specific sub-workflows where exceptions and approvals concentrate. Become the system of record for those decisions while syncing final state back to the incumbent.

Example: Maximor. Automates cash, close management, and core accounting workflows without ripping out the general ledger. The ERP remains the ledger, but Maximor becomes the source of truth for reconciliation logic.

Trade-off: Narrower moat. You're dependent on incumbent APIs. But faster time-to-value and lower switching costs for customers.

Path 3: Create New Systems of Record

Start as an orchestration layer, but persist what enterprises never systematically stored: the decision-making trace. Over time, that replayable lineage becomes the authoritative artifact.

Example: PlayerZero. Production engineering sits at the intersection of SRE, support, QA, and dev—a classic "glue function" where humans carry context software doesn't capture. PlayerZero starts by automating L2/L3 support, but the real asset is the context graph it builds: a living model of how code, config, infrastructure, and customer behavior interact. That graph becomes the source of truth for "why did this break?"—questions no existing system can answer.

Feature	Path	Example	Moat Source
Replace SoR	Full replacement	Regie	Data model + workflow ownership
Replace Modules	Targeted workflows	Maximor	Decision logic for specific domain
New SoR	Decision traces	PlayerZero	Context graph + precedent library

The Observability Layer

As decision traces accumulate and context graphs grow, enterprises will need to monitor, debug, and evaluate agent behavior at scale. The agent operations playbook becomes mandatory, not optional.

Arize is building the observability layer for this stack—visibility into how agents reason, where they fail, and how their decisions perform over time. Just as Datadog became essential infrastructure for monitoring applications, Arize is positioned to become essential infrastructure for monitoring agent decision quality. See also: the emerging LLM-as-Judge paradigm for automated evaluation.

This is a distinct opportunity from the three paths above. You don't need to own the context graph to build valuable infrastructure on top of it.

Key Signals: Where to Build

Signal 1: High Headcount

If a company has 50+ people doing a workflow manually—routing tickets, triaging requests, reconciling data between systems—that's a signal. The labor exists because the decision logic is too complex to automate with traditional tooling.

Signal 2: Exception-Heavy Decisions

Routine, deterministic workflows don't need decision lineage—the agent just executes. The interesting surfaces are where the logic is complex, precedent matters, and "it depends" is the honest answer. Think: deal desks, underwriting, compliance reviews, escalation management. These are the domains with the highest hallucination tax—and therefore the highest value for getting decisions right.

Signal 3: Glue Functions

RevOps exists because someone has to reconcile sales, finance, marketing, and customer success. DevOps exists because someone has to bridge development, IT, and support. Security Ops sits between IT, engineering, and compliance. HR sits between employees and every other function.

The Pattern: "Glue" functions emerge precisely because no single system of record owns the cross-functional workflow. The org chart creates a role to carry the context that software doesn't capture.

An agent that automates that role doesn't just run steps faster. It can persist the decisions, exceptions, and precedents the role was created to produce. That's the path to a new system of record—not by ripping out an incumbent, but by capturing a category of truth that only becomes visible once agents sit in the workflow.

How Incumbents Will Fight Back

Don't expect a clean disruption narrative.

Acquisitions. Incumbents will try to bolt on orchestration capabilities. Salesforce's Slack acquisition was this playbook—buy the communication layer to see cross-system context.

API lockdown. They'll adopt egress fees to make data extraction expensive—the same playbook hyperscalers used. "Your data is free to access. Your data is expensive to leave."

Ecosystem leverage. They'll push "keep everything in our ecosystem" narratives and build their own agent frameworks. Agentforce, Now Assist, Workday AI.

The Structural Reality: Capturing decision traces requires being in the execution path at commit time, not bolting on governance after the fact. Incumbents can make extraction harder, but they can't insert themselves into an orchestration layer they were never part of.

The Trillion-Dollar Question

The debate isn't whether systems of record survive. They will.

The question is whether the next trillion-dollar platforms are built by adding AI to existing data, or by capturing the decision traces that make data actionable. This is the strategic question at the heart of the agent thesis. And the answer may be: neither dominates. The Context Aggregator thesis argues that verification is domain-specific—which means no single platform can aggregate the market the way Google aggregated search. Instead, we get city-states: Harvey for law, Abridge for healthcare, vertical by vertical.

The companies building context graphs today are laying the foundation. Every automated decision adds another trace to the graph. Every captured exception becomes searchable precedent. Every approval chain persisted is one less piece of institutional knowledge that walks out the door when employees leave.

This is the feedback loop that makes it compound. Incumbents have the data. Startups in the execution path have the decisions. And decisions—not data—are the atomic unit of enterprise value.

The trillion-dollar question isn't theoretical. It's a build-or-buy decision every founder and enterprise architect will face in the next 24 months.

See also: The Company Graph for technical architecture, Agent Memory Architecture for the cognitive model, and The Production Gap for why most agent projects fail before they get here.

Systems of Agents: Where the Next Trillion-Dollar Platforms Get Built