How do you calculate Cost Per Completed Task (CPCT)?

CPCT = (C_compute + C_tools) / P_success + (P_fail × C_human). This formula captures compute costs, tool costs, success probability, and human remediation costs when tasks fail. A $0.005/run model with 50% success costs $3.00 per completed task when $6.00 human remediation is factored in.

Why is cost-per-token a vanity metric for AI?

Cost-per-token ignores three hidden taxes: the Retry Tax (failed tasks require re-running), Context Bloat Tax (retries accumulate tokens), and Human Remediation Tax (failed tasks need human intervention). A cheap model that fails often costs more than a premium model that succeeds.

What is the 3.75x cost inversion in AI models?

A model costing $0.005/run with 50% success rate results in $3.00 CPCT when human remediation costs $6.00. A premium model at $0.50/run with 95% success rate costs only $0.80 CPCT—3.75x cheaper despite being 100x more expensive per token.

How should enterprises budget for AI agents?

Use outcome-based budgeting: set a target CPCT threshold (e.g., max $2.00 per resolved ticket), then work backward to determine acceptable model costs and required success rates. This aligns AI spending with business value rather than raw consumption.

What is Cost Per Completed Task?

Cost Per Completed Task (CPCT) is the true economic metric for AI agents: CPCT = (C_compute + C_tools) / P_success + (P_fail × C_human). It reveals that "cheap" models are often more expensive—a model costing $0.005 per run with 50% success rate costs $3.00 per outcome when human remediation ($6.00) is factored in, versus $0.80 for a "premium" model at $0.50/run with 95% success. The three hidden taxes—retry, context bloat, and human remediation—make cost-per-token a vanity metric.

The CPCT Standard: Why Cost-Per-Token is a Vanity Metric

In the enterprise boardroom, "Cost Per Token" is the new "Hits Per Second"—a vanity metric that obfuscates the true health of the business.

When a CFO asks, "What does our AI workforce cost?", the engineering answer is typically: "o4-mini costs $0.10 per million tokens, while Claude Opus 4.5 is significantly more expensive."

This is the wrong answer. It frames AI as a software utility, like bandwidth or storage. But autonomous agents are not software utilities; they are digital labor. You do not measure a software engineer by the number of keystrokes they produce for a dollar; you measure them by the features they ship.

The prevailing obsession with minimizing inference costs has led enterprises into a "Race to the Bottom," optimizing for the cheapest model rather than the most capable one. This creates a false economy. A cheap model that requires five retries to parse a document is more expensive than a premium model that does it right the first time.

To build a sustainable AI strategy, we must adopt a new economic standard: Cost Per Completed Task (CPCT).

The Economics of Intelligence

The fundamental error in current AI budgeting is assuming that intelligence is a commodity. It is not. It is a probabilistic resource with varying degrees of reliability.

When you choose a "cheaper" model (e.g., o4-mini vs. GPT-5, or a self-hosted Llama 4 Scout vs. Claude Opus 4.5), you are trading capital (dollars per token) for operational risk (probability of failure).

In an agentic workflow—where an AI performs multi-step reasoning, tool use, and decision making—failures are not just "bad answers." They are process breaks. They result in loops, hallucinations, and, most expensively, human intervention.

The CPCT Formula

To visualize the true cost, we must expand the equation beyond the API bill.

CPCT = (C_compute + C_tools) / P_success + (P_fail × C_human)

Where:

C_compute: The cost of tokens (inference) used in the attempt
P_success: The probability (0.0 to 1.0) that the agent successfully completes the task without intervention
C_human: The cost of a human reviewing and fixing the failure

The Inversion Principle

This formula reveals a critical insight MMNTM calls the Inversion Principle:

”

As the complexity of a task rises, the cost of the model becomes the least significant variable in the economic equation.

A Concrete Example:

Consider a legal classification task.

Model A (Cheap): Costs $0.01 to run. Success rate 80%.
Model B (Smart): Costs $0.10 to run. Success rate 99%.
Human Review: Costs $5.00 to fix a failure.

The Math:

Model A CPCT: $0.01 + (20% × $5.00) = $1.01
Model B CPCT: $0.10 + (1% × $5.00) = $0.15

The "expensive" model is actually 6x cheaper per outcome. The 10x savings on tokens was an illusion that cost the business nearly a dollar per task in remediation.

The Three Hidden Taxes

Low-intelligence models impose three hidden taxes that inflate CPCT, often making them significantly more expensive than their "premium" counterparts despite the lower sticker price.

1. The Retry Tax

Autonomous agents are designed to self-correct. If a model tries to call a tool and fails (e.g., "Invalid JSON format"), it feeds the error back into the context window and tries again.

A low-capability model often lacks the reasoning depth to diagnose its own syntax errors. It enters a Retry Loop.

Scenario: An agent uses Llama 4 Scout ($0.21/1M tokens) to parse a complex PDF. It fails formatting 4 times before succeeding.

Cost Reality: You have paid for the input context 5 times. You have also paid for the "thinking time" (latency). The effective cost per token has quintupled.

2. The Context Bloat Tax

Smarter models are more concise. They can follow complex instructions in a single shot ("Zero-Shot").

Less capable models require "Few-Shot Prompting"—you must provide 5-10 examples of the desired output in the context window to get a reliable result.

This bloats the input prompt. If you have to feed 4,000 tokens of examples to save $0.50 on the model price, you are often paying more in aggregate bandwidth than if you had used a smarter model that understood the instruction natively.

3. The Human Remediation Tax (The Killer)

This is the single largest destroyer of AI ROI.

Consider a complex task: Reviewing a master service agreement (MSA).

Model A (Claude Opus 4.5): Costs $0.50 per run. Success Rate: 95%.
Model B (o3-mini): Costs $0.005 per run. Success Rate: 50%.

At face value, Model B is 100x cheaper. But consider the failure cost. When the model fails, a human lawyer ($300/hr) must spend minutes fixing it (assume $6.00 cost).

Opus 4 CPCT: $0.50 + (0.05 × $6.00) = $0.80
Nano CPCT: $0.005 + (0.50 × $6.00) = $3.00

The "cheaper" model is 3.75x more expensive per outcome. The massive savings on tokens was an illusion that cost the business $2.20 per task in human labor.

The Model Ladder Strategy

Enterprises should not default to the most expensive model, nor the cheapest. They should implement a Model Ladder architecture, routing traffic based on the "Price of Error."

Tier 1: The Utility Layer (o3-mini, Llama 4 Scout, Mistral Small 3.1)

Pricing: ~$0.10 - $0.21 per 1M input tokens
Role: Classification, simple extraction, routing, high-volume data cleaning
Success Requirement: High for deterministic tasks
Deployment: Use these to preprocess data before it hits the expensive reasoning engine

Tier 2: The Intelligence Layer (GPT-5, Claude Sonnet 4.5, Claude 4 Haiku)

Pricing: ~$0.75 - $2.00 per 1M input tokens
Role: General reasoning, planning, code generation, summarization
Success Requirement: Critical
Deployment: Default to these for standard user-facing interactions. They strike the balance on the "Cost-Quality Pareto Frontier."

Tier 3: The Reasoning Layer (Claude Opus 4.5, GPT-5.2 Thinking, Gemini 3 Pro)

Pricing: ~$15.00+ per 1M input tokens
Role: Architectural review, complex root cause analysis, high-stakes decision making (Legal/Medical)
Success Requirement: Absolute
Deployment: Use sparingly, but do not shy away from the cost. If an Opus 4 call costs $2.00 but prevents a $10,000 outage, the ROI is infinite.

The Bottom Line

We must stop treating AI cost optimization as a "procurement" problem (negotiating lower API rates) and start treating it as an engineering problem (optimizing for success rates).

As of late 2025, the gap between the "Smartest" model and the "Cheapest" model is widening, not shrinking.

Stop counting tokens.
Start measuring failures.
Calculate your CPCT.

If you optimize for the cost of the input, you will hemorrhage margin on the cost of the output.

When agents execute thousands of outcome-based tasks daily, traditional payment rails break down. The shift to per-resolution pricing introduces fundamental infrastructure challenges around micropayments and cross-vendor settlements—explored in depth in Agent Billing & Crypto.

For the complete framework on budget governance and runaway cost prevention, see Agent Economics. For verification cost analysis, read The Hallucination Tax. For human-in-the-loop patterns that optimize the expensive human resource, route to review only when confidence is low.

The CPCT Standard: Why Cost-Per-Token is a Vanity Metric

What is Cost Per Completed Task?

The CPCT Standard: Why Cost-Per-Token is a Vanity Metric

The Economics of Intelligence

The CPCT Formula

The Inversion Principle

The Three Hidden Taxes

1. The Retry Tax

2. The Context Bloat Tax

3. The Human Remediation Tax (The Killer)

The Model Ladder Strategy

Tier 1: The Utility Layer (o3-mini, Llama 4 Scout, Mistral Small 3.1)

Tier 2: The Intelligence Layer (GPT-5, Claude Sonnet 4.5, Claude 4 Haiku)

Tier 3: The Reasoning Layer (Claude Opus 4.5, GPT-5.2 Thinking, Gemini 3 Pro)

The Bottom Line

Related

Ask a follow-up