MMNTM logo
Return to Index
vertical

Customer Support Agents: The $50B Race to Replace Level 1

Customer support is the proving ground for autonomous AI. How outcome-based pricing validates Service-as-Software, and the shift from tools to labor replacement.

MMNTM Research
15 min read
#support#vertical-adoption#outcome-pricing#service-as-software#enterprise

Customer support has emerged as the proving ground for autonomous AI agents, validating the "Service-as-Software" thesis where enterprises pay for labor replacement rather than productivity tools. In 2025, this vertical shifted from experimental chatbots to outcome-priced agents handling work equivalent to thousands of human employees.

The numbers tell the story: Klarna's AI agent handles 2.3 million conversations monthly—work equivalent to 700 full-time agents—driving a $40M profit improvement. Intercom charges $0.99 per resolution, only collecting revenue when its AI successfully resolves a customer query. Zendesk introduced automated resolution tiers ($0.90–$1.50), abandoning pure per-seat licensing for outcome-based models.

This isn't incremental productivity. It's the first mass-market deployment of agents displacing human labor at scale.

Why Support Is Agent-Friendly: The Blueprint

Level 1 support serves as the ideal "beachhead market" for AI agents due to structural advantages that don't exist in high-stakes verticals like legal or medical.

High Error Tolerance & Recoverability

Unlike a hallucination in a legal contract or medical diagnosis, a support error is easily recoverable. If an AI misinterprets a return policy, the customer can escalate to a human without catastrophic damage. The stakes are measured in satisfaction scores, not malpractice lawsuits.

Structured Knowledge (RAG-Ready)

Support teams have spent decades building knowledge bases, FAQs, and macros. This structured data is perfect for Retrieval-Augmented Generation (RAG), allowing agents to ground answers in company truth rather than unreliable training data.

"The biggest advantage we had was 15 years of Zendesk help articles. Our AI didn't need to guess—it just needed to retrieve the right answer and paraphrase it naturally." — Head of Support Operations, SaaS company with 72% AI resolution rate

Measurable Outcomes

Success is binary and immediate. A "resolution" can be programmatically defined:

  • Did the customer ask for a human?
  • Did they mark the answer as helpful?
  • Did they open a new ticket within 24 hours?
  • Did they complete the transaction (refund processed, password reset)?

This measurability makes outcome-based pricing possible. You can't charge "$0.99 per legal brief" because quality is subjective. You can charge $0.99 per support resolution because success is objective.

Graceful Degradation (Native HITL Fallback)

The "human-in-the-loop" (HITL) fallback is native to support workflows. If an agent's confidence score drops below 80%, it can seamlessly route the ticket to a human queue without breaking the user experience. The customer never sees "AI failed"—they see "connecting you to a specialist."

This stands in contrast to autonomous coding agents (where a bug ships to production) or medical agents (where a misdiagnosis causes harm before detection).

Intercom Fin: Outcome Pricing in Action

Intercom Fin represents the industry's clearest shift from SaaS (paying for software access) to Service-as-Software (paying for work done).

The Pricing Model: $0.99 Per Resolution

What counts as a resolution?

A resolution is counted only when:

  1. The customer's question is answered
  2. They do NOT request a human agent
  3. They do NOT reopen the conversation within 24 hours

If the agent fails any of these criteria, the customer pays $0.00.

Why this matters:

This model transfers performance risk entirely to the vendor. Traditional SaaS vendors get paid whether their software works or not. Intercom only generates revenue if its AI actually resolves the issue.

It aligns incentives perfectly: Intercom is now incentivized to improve AI quality, not just sell more seats.

The Technology Shift: OpenAI → Anthropic Claude

In late 2024, Intercom switched its underlying model from OpenAI's GPT-4 to Anthropic's Claude 3.5 Sonnet.

Why Claude?

Internal benchmarks showed Claude 3.5 Sonnet offered:

  • Superior reasoning for complex support policies (multi-step logic)
  • Higher safety compliance (reducing "jailbreak" risks where users trick the AI into violating policies)
  • Better handling of ambiguous queries requiring clarification

"Claude's 'Constitutional AI' approach meant fewer incidents where customers manipulated the bot into issuing unauthorized refunds or discounts. That alone justified the switch." — Intercom engineering blog (paraphrased)

Results: From 23% to 72% Resolution Rates

  • Gen 1 Fin (early LLM era): 23% average resolution rate
  • Fin 2 (current generation): 51% average resolution rate
  • Best-in-class customer (Gamma): 72% resolution rate with 84% CSAT

Gamma, a presentation software company, reported reducing manual handling volume by 70% while maintaining an 84% customer satisfaction score—higher than their human baseline.

Klarna: Scale & The Empathy Ceiling

Klarna provides the most public and aggressive data on enterprise-scale displacement of labor.

The "Replacement" Metrics (2024)

  • Volume: 2.3 million conversations/month (two-thirds of all global support volume)
  • Labor equivalence: Doing the work of 700 full-time agents
  • Speed: Average resolution time collapsed from 11 minutes to under 2 minutes
  • Financials: Projected $40M annual profit improvement driven by reduced outsourcing costs

These aren't "productivity gains." This is explicit labor replacement. Klarna's press release didn't say "our agents are 700x more productive"—it said the AI does the work of 700 agents.

The "Empathy" Pivot (2025)

Despite the efficiency, Klarna adjusted its strategy in mid-2025. Customer feedback indicated that while routine tasks (refunds, order tracking, "where's my package?") were perfect for AI, sensitive financial disputes felt "robotic" and dismissive. This highlights a broader insight about AI interfaces—chat isn't always the answer for every interaction, as explored in Beyond Chat Interfaces.

The correction:

Klarna reintroduced human agents for complex, high-emotion queries, positioning AI as the "triage and transaction" layer rather than the total replacement.

This highlights the "Level 2 Ceiling": AI dominates logic and logistics but struggles with emotional de-escalation. A customer disputing a fraudulent charge doesn't just want efficiency—they want empathy and assurance that someone cares.

"The AI was technically correct in denying my dispute, but it felt like arguing with a vending machine. When I finally got a human, they approved it in 30 seconds because they understood the context." — Klarna customer, Reddit

The lesson: Speed without empathy is a brand liability in high-stakes moments.

Market Size & Adoption Velocity

Total Addressable Market

  • Global CX BPO market: $102B (2024) → $296B projected by 2033
  • Level 1 automation opportunity: $50B+ (routine queries, FAQ handling, transactional tasks)

Adoption Metrics (2024–2025)

  • 94% of customers now opt for AI chat when given the choice for immediate service (Salesforce 2025 report)
  • 22x growth in AI agent conversations from January 2024 to January 2025
  • Insurance vertical: AI adoption in claims triage surged 325% YoY, proving the model is rapidly moving beyond simple e-commerce returns

The velocity is extraordinary. This isn't gradual adoption—it's exponential displacement.

Competitive Landscape: Innovators vs Incumbents

A clear divide has formed between "AI-Native" challengers building for outcomes and "Incumbents" defending seat-based revenue.

CategoryKey PlayersPricing StrategyPrimary Value Prop
AI-NativeIntercom Fin$0.99/resolutionPure outcome alignment; risk transfer to vendor
SierraOutcome-based (custom)High-end enterprise; "Agent OS" for complex flows ($150k+ ACV)
ForethoughtDeflection blend"Support-as-a-Service"; pays for successful deflections
IncumbentsZendeskAutomated ResolutionHybrid: $0.90–$1.50/resolution tiers on top of seat licenses
SalesforceAgentforce$2/conversation (legacy) or $550/user/mo editions. Complex add-on structure
ServiceNowPro Plus SKUValue uplift via "Now Assist" add-ons; targeting IT/internal support

The Strategic Divide

AI-Native challengers (Intercom, Sierra, Forethought) use outcome pricing as a wedge to displace legacy helpdesks entirely. Their pitch: "Why pay $50/user/month whether it works or not? Pay us $0.99 only when we succeed."

Incumbents (Salesforce, Zendesk, ServiceNow) are attempting to "have it both ways"—keeping high per-seat license fees while adding consumption models. This creates pricing complexity and misaligned incentives: they still make more money selling seats than resolutions.

The question: Can incumbents transition to outcome-based models without cannibalizing their core revenue? Or will AI-native startups eat them from below?

Technology Stack & Architecture

The technology has evolved from simple "chatbot" scripts to autonomous agentic architectures.

The Brain: LLM Selection Matters

Anthropic Claude:

  • Leading enterprise adoption (Intercom, others) for superior reasoning and "Constitutional AI" safety features
  • Critical for brand safety: fewer hallucinated policies, unauthorized discounts, or inappropriate responses
  • Both Opus and Sonnet tiers offer strong performance for support workflows

Salesforce "Atlas" Reasoning Engine:

  • Proprietary "System 2" approach—pauses to plan, critique, and refine a response before sending
  • Multi-step reasoning: "Is this covered by warranty? → Check purchase date → Check warranty terms → Verify eligibility → Draft response"

The LLM choice isn't just about accuracy—it's about auditability and safety. A support agent that offers unauthorized refunds or violates company policy is a liability, not an asset.

RAG 2.0: Beyond Reading to Acting

Modern agents don't just "read" FAQs. They connect to backend APIs to perform actions:

  • Shopify integration: Process refund, update shipping address, cancel order
  • Stripe integration: Pause subscription, update payment method, view transaction history
  • Internal databases: Check inventory, verify account status, pull order details

This is the shift from "chatbot" (answering questions) to "agent" (taking actions).

Escalation Logic: Sentiment-Aware Handoff

Agents now monitor user sentiment in real-time using NLP models:

  • Frustration signals: Caps lock, profanity, keywords like "this is ridiculous" or "I want a manager"
  • Confidence scoring: If the agent's confidence drops below 80%, trigger escalation
  • Warm handoff: Pass full conversation context to human agent to avoid "repeat yourself" frustration

"The difference between a good AI and a bad AI isn't accuracy—it's knowing when to call a human." — CX leader at Fortune 500 retailer

The Expansion Playbook: Beyond Support

The "Support Agent" architecture—Input → RAG → Action → Resolution—is now being replicated across other knowledge verticals.

Sales: AI SDRs (11x.ai)

11x.ai's "Alice" replaces the human SDR (Sales Development Rep). Instead of outcome pricing, they use a Capacity Model (~$5,000/month) to simulate a "digital employee" salary.

Capabilities:

  • Lead qualification via email and LinkedIn
  • Meeting scheduling with calendar integration
  • Personalized outreach at scale (1,000+ leads/day)

Economics: $5k/month vs $60k/year human SDR (90% cost reduction)

HR: Employee Helpdesk (Leena AI)

Leena AI automates 70% of internal employee tickets:

  • IT requests ("reset my password," "request new laptop")
  • Benefits questions ("how much PTO do I have?")
  • Policy lookups ("what's the remote work policy?")

Impact: HR teams shift from answering 100+ tickets/day to handling only complex cases (investigations, disputes, sensitive matters).

IT: ServiceNow Agents

ServiceNow reports 80% autonomous handling of IT inquiries, creating a massive efficiency dividend for internal CIOs.

Use cases:

  • Password resets and access requests
  • Software license provisioning
  • Incident triage and routing

The pattern is identical: high-volume, repetitive queries with structured knowledge and clear success criteria.

ROI & Business Case: The CFO Math

The economics of adoption are undeniably attractive.

Cost Arbitrage

  • Human ticket cost: $6.00–$12.00 (fully loaded agent cost: salary + benefits + overhead)
  • AI ticket cost: $0.99–$1.50 (outcome pricing)
  • Savings: Instant 80–90% margin improvement on every automated ticket

Example:

  • 100,000 monthly tickets × $9 human cost = $900k/month
  • 60,000 automated × $1 AI cost = $60k/month
  • Savings: $840k/month on automated tickets ($10M/year)

Time Savings

  • Average Handle Time (AHT): Hours (email) or minutes (chat) → seconds (AI)
  • First Response Time (FRT): Minutes or hours → instant
  • Klarna: 11 minutes → under 2 minutes (82% reduction)

Hidden Value

24/7 Availability:

  • No night-shift premiums
  • No weekend staffing costs
  • Instant response in all time zones

Infinite Elasticity:

  • AI agents scale instantly during Black Friday spikes without hiring/firing cycles
  • No "we're experiencing higher than normal call volume" messages

Consistency:

  • No "bad day" variance
  • Same quality at 3am and 3pm
  • No agent training ramp-up time

Key Tensions & The Hybrid Future

1. The "Robotic" Backlash

As seen with Klarna, over-rotating on efficiency can damage brand equity. Customers accept AI for speed but reject it for empathy.

The winning model: Hybrid AI + Human

  • AI for the "easy" 60% (tracking, FAQs, transactional tasks)
  • Humans for the "hard" 40% (disputes, complaints, sensitive issues)

2. Job Displacement: The Quiet Labor Shift

While companies cite "freeing agents for complex work," the math (700 agents replaced) is explicit. The industry is facing a quiet but massive labor shift from "Tier 1 Support" to "AI Supervisors."

What happens to displaced agents?

  • Some retrain as "AI trainers" or "escalation specialists"
  • Most are simply not replaced (attrition vs layoffs)
  • The industry is shrinking support headcount by 30–50% over 2–3 years

3. Vendor Lock-in: The API Integration Trap

As agents become deeply integrated into backend APIs (processing refunds, changing account data), switching vendors becomes harder than switching a simple UI helpdesk.

Implication: Outcome pricing creates short-term alignment but long-term lock-in. Once Intercom Fin is processing 70% of your tickets via Stripe/Shopify APIs, ripping it out is a 6-month migration project.

The Bottom Line

Customer support agents are no longer an experiment—they are a $50B deflationary force hitting the labor market. The success of outcome-based pricing ($0.99/resolution) proves that enterprises are ready to buy results, not tools.

This vertical is the blueprint for the broader automation of the knowledge economy:

  • If you can replace a Tier 1 Support Agent today...
  • You can replace a Junior SDR tomorrow...
  • And a Paralegal next year.

The question isn't whether AI will displace knowledge work. The question is: Which verticals are next, and how fast will it happen?

Customer support just showed us the answer.

Customer Support Agents: The $50B Race to Replace Level 1