MMNTM logo
Strategy

The Two Pizza Agent Team: Skunkworks for Enterprise AI

The organizational playbook for AI adoption isn't about committees and roadmaps. It's about small, autonomous teams with something to prove. Here's why the Bezos model wins again.

MMNTM Research Team
11 min read
#Enterprise AI#Organization#Strategy#Team Structure#Best Practices

The Two Pizza Agent Team: A Skunkworks Model for Enterprise AI

Okay, here's the thing that keeps coming up in every conversation about enterprise AI adoption: the organizational structure matters more than the technology choice.

And when you dig into the companies that are actually shipping—not piloting, not experimenting, but shipping—production AI systems, a pattern emerges. It's the same pattern that shows up in almost every major technological inflection point. It's the skunkworks model. It's the small team. It's, to use the Bezos framing, the two-pizza team.

Let's break down why this works, because the underlying dynamics are genuinely fascinating.

The Historical Pattern

Before we get into AI, it's worth noting that this is not a new playbook. This is the playbook.

Lockheed Skunk Works (1943): Kelly Johnson and 23 engineers, working in a circus tent next to a plastics factory, built the P-80 jet fighter in 143 days. The rest of Lockheed's bureaucracy would have taken years.

Apple Macintosh (1979-1984): Steve Jobs literally flew a pirate flag over Building 3. The Mac team was physically separated from the rest of Apple, had different rules, and—crucially—had permission to ignore the existing business.

Amazon Web Services (2003-2006): Andy Jassy and a small team, operating with minimal oversight, built what would become a $100 billion business. They shipped before anyone at Amazon fully understood what they were building.

The pattern is the same every time: Small team. Clear mission. Physical or organizational separation from the mothership. Permission to break rules. Direct access to leadership.

Here's the thing—and this is the key insight—the reason this works is not about talent density. It's about communication overhead and decision velocity.

The Math of Small Teams

Jeff Bezos articulated this with the two-pizza rule: if a team can't be fed by two pizzas, it's too big. But the underlying math is more fundamental.

The number of communication channels in a team grows quadratically with team size:

Channels = n(n-1)/2

Team of 5:  10 channels
Team of 10: 45 channels
Team of 15: 105 channels
Team of 20: 190 channels

Every channel is a potential point of coordination, miscommunication, waiting, and politics. Double the team size and you quadruple the communication complexity.

But here's the nuance that matters for AI specifically: AI development is more iterative and experimental than traditional software development. The number of decisions per day is higher. The feedback loops are shorter. The cost of a wrong decision is lower (you can retrain), but the cost of delayed decisions is devastating (the competitive window closes).

A 20-person team making 10 decisions per day across 190 channels is functionally paralyzed compared to a 5-person team making 50 decisions per day across 10 channels.

This is why 90% of AI pilots fail. Not because of the technology. Because of the organizational structure.

What a Two Pizza Agent Team Actually Looks Like

Okay, let's get concrete. What does this team look like in practice?

The Composition (5-8 people, max)

1. The Technical Lead / Architect Senior engineer who has shipped ML/AI systems before. They make the architecture decisions. LangGraph vs. AutoGen. Vector store choice. Evaluation framework. They own the graph architecture.

2. The Product Mind Could be a PM, could be a founder, could be an engineer with product instincts. They own the "what" and the "why." They translate business problems into agent capabilities. They define success metrics—ideally Cost Per Completed Task, not technical benchmarks.

3. The Prompt/Agent Engineer(s) One or two people who spend all day in the prompts. Building the agent orchestration. Tuning the behaviors. Running evaluations. This is not "prompt engineering" in the sense of clever tricks—it's systematic engineering of AI system behavior.

4. The Full-Stack Generalist Someone who can build the UI, connect the APIs, deploy the infrastructure. In the early days, you cannot afford specialists. You need people who can do whatever needs doing.

5. The Domain Expert (rotating or embedded) The person who actually does the job the agent is supposed to do. A paralegal for legal AI. A support rep for customer service AI. Not a consultant who describes the job—someone who does it. They're the ground truth for whether the agent is actually working.

What's Not on the Team

  • No dedicated project managers
  • No committees or steering groups
  • No external vendor management (initially)
  • No documentation specialists
  • No QA team (the prompt engineers and domain expert are the QA)

This isn't because those functions don't matter. It's because in the early stages, the overhead of coordination exceeds the value of specialization.

The Operating Model

Here's where it gets interesting. The two pizza team can't just be small—it has to operate differently than the rest of the organization. Otherwise it gets pulled back into the bureaucratic gravity well.

Direct Executive Access

The team lead should be able to walk into the CEO's office (or Slack them) without going through layers. Not because they're special, but because the decisions being made are high-frequency and high-stakes.

When you're iterating daily on an AI system that might transform a core business process, you cannot wait two weeks for a steering committee to approve a direction change.

This is the most common failure mode: A team is formed with the right composition, but embedded in the standard reporting structure. Every decision requires escalation. Every experiment needs approval. The team ships in months what should take weeks.

Separate Metrics

The two pizza team should not be measured on the same metrics as the existing business. They're not optimizing for this quarter's revenue. They're building the capability for next year's revenue.

Measure them on:

  • Velocity: How quickly can they ship iterations?
  • Learning rate: How much do they learn from each iteration?
  • Production path: Are they building toward production or toward a demo?

Do not measure them on:

  • Impact to current revenue
  • Headcount efficiency ratios
  • Traditional project management milestones

Physical or Organizational Separation

This sounds old-fashioned, but it matters. The team should either be physically separated (different floor, different building) or organizationally separated (different Slack channels, different meetings, different rhythm).

The goal is to prevent the antibodies of the existing organization from attacking the new thing before it's strong enough to survive.

Historical parallel: When IBM built the PC, they did it in Boca Raton, Florida—far from the mainframe bureaucracy in Armonk, New York. The physical distance was strategic.

Time-Boxed Mandate

The team needs a clear window: 90 days, 6 months, whatever is appropriate. At the end of that window, one of three things happens:

  1. Success: The project graduates to a full team with proper resourcing
  2. Pivot: The learning is valuable, but the direction changes
  3. Kill: The hypothesis was wrong, and the team disperses

What cannot happen is indefinite piloting. That's how you end up in Pilot Purgatory.

Why This Beats the Committee Model

Most enterprises approach AI with what we might call the "committee model":

  • Form a cross-functional AI steering committee
  • Hire a Chief AI Officer who reports to... someone
  • Engage a major consulting firm for a "roadmap"
  • Run 50 pilots across 12 business units
  • Meet monthly to review progress

This feels responsible. It feels enterprise-grade. It is also a recipe for spending millions of dollars to produce PowerPoint decks.

The fundamental problem: The committee model optimizes for risk distribution, not value creation. Every stakeholder gets a voice. Every concern gets addressed. Every decision gets documented. And nothing ships.

The two pizza team optimizes for the opposite: concentrated accountability. One small team, one clear mandate, one executive sponsor. If it fails, everyone knows whose failure it was. If it succeeds, everyone knows whose success it was.

This sounds harsh, but it's actually more humane. The committee model produces shared failure—diffuse, unaccountable, unlearnable. The small team model produces specific failure—painful but educational.

And the success cases are not even close. The companies shipping production AI systems today—actually in production, actually driving revenue—almost universally started with a small team, not a committee.

The Escape Velocity Problem

Here's the final piece: getting the two pizza team started is the easy part. The hard part is keeping it small and fast as it succeeds.

Success creates pressure:

  • The project gets visibility, which brings more stakeholders
  • The scope expands, which requires more capabilities
  • Other teams want to contribute, which adds coordination
  • Legal/compliance/security need involvement, which adds process

Each of these pressures is individually reasonable. Collectively, they transform the two pizza team into a 50-person department indistinguishable from any other part of the organization.

The discipline required: You have to actively resist the growth pressure. When someone says "we should add X to the team," the default answer should be "no." When a new requirement emerges, the default response should be "can the existing team absorb this?" rather than "who should we hire?"

The goal is to keep the team at two pizzas until the project either fails or reaches genuine production deployment. Only then should you consider scaling—and even then, consider scaling by creating more two pizza teams rather than one larger team.

This is the Amazon model: Not one big team, but many small teams, each owning a distinct surface area, communicating through well-defined APIs.

Making the Case Internally

If you're reading this and thinking "yes, but how do I actually make this happen in my organization," here's the playbook:

1. Find the Executive Sponsor

You need one senior leader who believes in the small team model and is willing to protect it. Not a committee of sponsors. One person.

2. Pick a Bounded Problem

Don't pitch "transform the company with AI." Pitch "automate the initial review of support tickets in the EMEA region." Small, specific, measurable. Cost Per Completed Task becomes your success metric.

3. Request a Time Box, Not a Headcount

"Give me 5 people for 90 days" is easier to approve than "give me 12 people for 2 years." The bounded commitment reduces perceived risk.

4. Negotiate the Air Cover

Explicitly ask for: direct access to sponsor, exemption from standard reporting cadence, permission to make technical decisions without review. If you can't get these, the organizational structure will defeat you regardless of the team composition.

5. Define the Exit Criteria

What does success look like in 90 days? Be specific. "Agent handles 40% of support tickets in EMEA with HITL approval and maintains 95% customer satisfaction." Now everyone knows what they're signing up for.

The Contrarian Take

Here's where we might lose some people: most organizations should not run 10 AI pilots. They should run 1-2, with properly resourced two pizza teams.

The data shows that pilot success rate is not improved by running more pilots. It's improved by running better pilots—ones with proper resourcing, clear mandates, and organizational protection.

The company running 50 pilots with steering committees will ship zero production AI systems. The company running 2 pilots with empowered two pizza teams will ship both.

Concentration beats diversification in AI adoption. The same principle that makes small teams work makes small portfolios work.

The Bottom Line

The two pizza agent team isn't a new idea. It's the oldest idea in technology—the small team of talented people, working outside the bureaucracy, building something new.

What's new is the recognition that AI development is even more suited to this model than traditional software. The iteration speed is higher. The decision frequency is higher. The cost of organizational overhead is higher.

If you want to ship production AI systems, don't start with a roadmap. Don't start with a committee. Don't start with a consulting engagement.

Start with five people who know what they're doing, a clear problem, and permission to move fast.

The rest is execution.

For the architectural patterns your two pizza team should use, see The Graph Mandate. For avoiding common failure modes, read Why Agents Die. For the operational playbook once you hit production, see Agent Operations Playbook.