Intelligence Research
Deep dives into AI agents, multi-agent systems, and the evolution of autonomous intelligence.
Are You Doing Policy Theater?
The Air Canada ruling proved your chatbot IS your company. 88% of enterprises deploy AI, but only 14% have real governance. Here's how to tell if yours is theater or infrastructure.
Build an AI Agent from Scratch: The 80-Line Implementation
Build a working AI agent in 80 lines of Python. No frameworks—just a loop, tools, and memory. The primitives every LangChain abstracts away.
The Hard Thing About AI Agents
The demo worked. The pilot impressed the board. Now your agent is hallucinating to customers at 3am. Here are the hard truths about deploying AI agents that nobody wants to tell you.
The Deep History: How 2016 Planted the Seeds for the 2025 AI Revolution
The AI revolution didn't start with ChatGPT. Analysis of 98,000 Techmeme articles reveals that AlphaGo, OpenAI's founding, the transformer paper, and Nvidia's GPU pivot created every dynamic that matters today. The foundational era (2014-2017) determined who would win—and who would be forgotten.
The Agent Autopsy: Five Ways to Lose a Million Dollars
Real production agent failures dissected with the rigor of an SRE post-mortem. Five case studies of silent catastrophes—infinite loops, hallucinating RAG, identity confusion—and how to prevent them.
The Company Graph: Why Enterprise AI Needs Memory That Understands Relationships
RAG retrieves documents. Context graphs understand relationships. The missing infrastructure layer between enterprise data and AI agents that actually work.
Systems of Agents: Where the Next Trillion-Dollar Platforms Get Built
The debate isn't whether systems of record survive AI. It's whether new ones emerge—systems of record for decisions, not objects. Three paths for founders building in the execution layer.
The AI-Assisted Engineering Playbook: From Vibe Coding to Production-Grade
A unified framework for AI-assisted development. When to embrace vibe coding, when to enforce discipline, and the verification loop that prevents AI-generated chaos.
The AI Infiltration Effect: What 77,000 Articles Reveal About Tech's Structural Shift
Tech news feels samey. We quantified why. Analysis of 77,000 Techmeme articles reveals AI didn't just grow—it infiltrated every other beat. The data behind a permanent reorganization.
The Great Power Redistribution: AI Startups vs. Big Tech in the Attention Economy
Everyone says AI concentrates power in Big Tech. The data says the opposite. Startups went from 3% to 86% of Big Tech's AI coverage in five years. What the narrative got wrong.
The Agent Thesis, Quantified: How We Went From Chatbots to Autonomous Agents
The semantic shift from 'chatbot' to 'agent' wasn't gradual - it was sudden. Analysis of 77,000 articles reveals the exact quarter autonomy language took over. The agent era began Q4 2024.
The Narrative War: How Anthropic and OpenAI Are Covered Differently
OpenAI dominates volume. Anthropic dominates sentiment. Analysis of 90,000 articles reveals two companies executing fundamentally different media strategies—and both are winning.
The Agentic Category: How Enterprise AI Invented a Word and a $100B Market
The word "agentic" didn't exist in tech coverage until January 2025. By December, it appeared in 50 headlines and defined a category that spawned $10B valuations and 139 funded startups. The data on how a word became a market.
The Microsoft Hedge: How a $13B Bet Became a Portfolio Strategy
Microsoft's $30B Azure deal with Anthropic wasn't a sudden pivot - it was the culmination of a 20-month hedging strategy that began when the Altman firing revealed Microsoft had no control over its biggest AI bet.
RLVR: When Verification Became the Training Signal
How 2025's shift from RLHF to RLVR changed model training, created jagged intelligence, and unlocked test-time compute. The paradigm that replaced human feedback.
2030: A Day in the Life of the AI-Native Founder
By 2030, the line between team and agents has dissolved. A speculative but grounded look at what work looks like when agents operate, not assist—showing the trajectory from 2025 to get there.
Beyond Chat: The Interface Revolution for AI Agents
Chat was a transitional interface. Production workflows need Generative UI, ambient copilots, and task-native agents that respect domain physics.
The Architect's Guide to Engineering Claude Code Skills
A comprehensive manual for process engineering, context economics, and agent specialization. Learn how to transform Claude Code from a generalist into a specialized agent through modular skills.
Context Engineering: From Amnesia to Expertise
Context is 90% of agent performance. How to load domain expertise, develop voice, and accumulate institutional knowledge across 200K+ token windows.
10 Verticals Getting Automated by AI Agents Right Now
AI agents aren't future technology—they're deploying now with measurable ROI. Legal, healthcare, support, sales, HR, and more. Here's where automation is actually happening.
The Hollow Firm 2.0: What Happens When Juniors Disappear
AI is automating junior work in law, consulting, and finance. Short-term margin expansion, but a 2035 succession crisis when AI-trained juniors become senior experts.
The Momentum Thesis: Why We Build AI Employees
Founders trade two resources: time and momentum. We built MMNTM to handle the work that must exist so your business can exist. Our philosophy on AI employees.
The Context Window Race: Why 10 Million Tokens Doesn't Mean 10 Million Useful Tokens
The gap between claimed context and effective context is the defining quality metric of 2025. Llama 4 Scout's 10M tokens collapse to ~1K effective on semantic tasks. Here's what the benchmarks actually show.
Trust Architecture: Making AI Agents Auditable
The gap between "AI-powered" and "production-ready in regulated industries" is auditability. EU AI Act, GDPR, SOC 2, and the technical patterns that make agents legally defensible.
Building Agent Evals: From Zero to Production
Why 40% of agent projects fail: the 5-level maturity model for production evals. Move beyond SWE-bench scores to measure task completion, error recovery, and ROI.
The Claude Code Superuser Guide: From Developer to Agent Orchestrator
How to master Claude Code by shifting from writing code to orchestrating AI agents. Parallel development, context mastery, and the workflows that unlock 10x productivity.
Agent Identity: Why Saviynt's $700M Raise Signals a New Security Category
Saviynt's $700M raise validates a thesis: AI agents outnumber humans 82:1 and traditional identity systems can't cope. Agent identity is infrastructure.
Customer Support Agents: The $50B Race to Replace Level 1
Customer support is the proving ground for autonomous AI. How outcome-based pricing validates Service-as-Software, and the shift from tools to labor replacement.
HR Agents: The $20B Helpdesk Automation Nobody Sees Coming
The $2.85B Moveworks acquisition validates HR as the next customer support. Competitive landscape, ROI math, and the enterprise copilot wedge.
Sales Automation Agents: The $30B Race to Replace SDRs
Sales is the highest-value vertical for AI agents. How SDR agents differ from support (persuasion, objection handling), and the race to revenue-center automation.
7AI: When AI Agents Defend Against AI Attacks
The $130M Series A validates a thesis: only autonomous AI agents can fight AI-driven threats. Inside the Cybereason founders' bet on Agentic Security.
Cursor: How Forking VS Code Built a $29B Company
Anysphere reached $1B ARR in 24 months by making a controversial bet: fork VS Code to gain "root access" to the developer workflow. Inside the architecture that plugins can't replicate.
Abridge: The $5.3B Bet That Doctors Want Their Lives Back
For every 1 hour with patients, physicians spend 2 hours on documentation plus 1-2 hours of "pajama time" after hours. Abridge reached $5.3B by solving the burnout crisis with Epic-integrated AI that saves 2+ hours per day.
Harvey: The $8B Legal AI That BigLaw Actually Trusts
How Harvey became the category-defining legal AI by solving what ChatGPT couldn't: data privacy through the Vault, 0.2% hallucination rate through citation-backed generation, and workflow integration at 4,000-lawyer firms. The definitive case for vertical AI.
Anthropic: How Safety Became the Enterprise AI Standard
Anthropic captured 32-40% of enterprise AI in 18 months. Constitutional AI as GTM, Claude Code as developer wedge, multi-cloud for distribution. The $183B blueprint.
Databricks: The $100B Data Foundation Nobody Talks About
While everyone obsesses over OpenAI and Anthropic, Databricks quietly became the hidden infrastructure layer for every enterprise AI agent. From lakehouse to Unity Catalog to DBRX, here's why they own the data moat.
When RPA Meets AI: The $30B Automation Collision
The $20B+ RPA industry built on deterministic scripts is colliding with probabilistic AI agents. The winner will be whoever successfully orchestrates both.
The Agent Stack: A Complete Reference
A curated reading path through 30+ articles on building production AI agents. Organized by layer: Foundation, Architecture, Operations, Economics, Security, and Evaluation.
The Agent Thesis: What We Know After 100 Deployments
A synthesis of the patterns that separate agents that ship from agents that die in pilot purgatory. The throughlines across architecture, operations, economics, and security.
Devin: The Autonomous Engineer (Or Is It?)
Cognition AI's Devin: $10B valuation, IOI gold medalists, SWE-bench breakthrough—and the controversy. Why it's a force multiplier, not a replacement.
Vertical Agents Are Eating Horizontal Agents
Harvey ($8B), Cursor ($29B), Abridge ($2.5B): vertical agents are winning. The "do anything" agent was a transitional form—enterprises buy solutions, not intelligence.
The Asymmetric Bet: Game Theory for the AI Era
AI creates asymmetric payoffs that invert traditional competitive dynamics. Startups have everything to gain. Incumbents have everything to lose. The rational strategy depends entirely on what you're protecting.
The Two Pizza Agent Team: Skunkworks for Enterprise AI
The organizational playbook for AI adoption isn't about committees and roadmaps. It's about small, autonomous teams with something to prove. Here's why the Bezos model wins again.
Why 90% of AI Pilots Still Fail (And How to Beat the Odds)
Only 5-10% of enterprise AI initiatives escape pilot phase to deliver measurable ROI. The problem isn't the technology—it's data readiness, the performance illusion, and organizational deficits.
Solve Intelligence: The AI Operating System for Patent Law
Solve Intelligence exemplifies the vertical agent thesis—domain depth, proprietary fine-tuning, and workflow integration create moats that horizontal AI cannot replicate.
The Durable Agent: Why Infrastructure Beats Prompts
A 15-minute task that crashes at 99% wastes $4.50 in compute. Temporal eliminates the Restart Tax and turns debugging into DVR replay.
The Input Assurance Boundary: Treating Prompts Like SQL Injection
Prompt injection is not a bug. It is an architectural feature of LLMs. Security audits show 73% of systems are vulnerable. Safety is not a prompt. Safety is architecture.
The Graph Mandate: Why Chat-Based Agents Fail in Production
The "Chat Loop" is the "goto" statement of the AI era. 70-90% of enterprise AI projects stall in Pilot Purgatory. Graph architectures are the path to production.
Agent Memory: From Stateless to Stateful AI
LLMs are stateless by design. Agents require state. The memory architectures—context management, vector stores, knowledge graphs—that transform amnesiacs into collaborators.
MCP: The Protocol That Won (For Now)
MCP solved the N×M integration crisis and achieved escape velocity through strategic open-sourcing and the Linux Foundation play. The de facto standard for AI connectivity—though not without costs.
The MCP Tax: When Standards Cost You 99% of Your Token Budget
The design decisions that grant MCP its universality—verbose schemas, data through context—create a compounding tax on tokens, latency, and model intelligence. Anthropic's own fixes prove the original architecture is broken.
The Agent Attack Surface: Security Beyond Safety
The shift from chat to agency creates a new threat model. AI Security differs from AI Safety. Prompt injection is unsolved—defense requires architectural containment, not prevention.
RAG Is Oversold: The Gap Between Tutorial and Production
95% of RAG projects fail to reach production. The gap isn't infrastructure—it's retrieval accuracy, data processing, and reasoning. Naive RAG is obsolete; production requires rigorous engineering.
The HITL Firewall: How Human Oversight Doubles Your AI ROI
Full autonomy is a myth for high-stakes tasks. Smart thresholds with human review deliver 85% cost reduction at 98% accuracy. Here are the approval patterns that work.
The 500ms Threshold: Why Latency Kills Voice AI
Voice AI has a hard latency ceiling. Exceed 500ms round-trip and users abandon. This shapes every architectural decision from model selection to interrupt handling.
ElevenLabs: The Voice Infrastructure Play
ElevenLabs pivoted from creative TTS tool to real-time voice infrastructure. At $3.3B valuation, they bet on becoming the "Voice OS" of the enterprise.
Vercel AI SDK: The React Developer's AI Layer
Vercel AI SDK commoditizes LLM consumption for React/Next.js developers. Model agnosticism, streaming DX, and type safety—with the trade-offs you need to know.
The Probabilistic Stack: Engineering for Non-Determinism
LLMs break the fundamental assumption of software engineering: deterministic inputs produce deterministic outputs. New patterns required.
Voice: The Universal API for Human-Computer Interaction
Voice is not a feature—it's an interface paradigm shift. The trajectory from CLI to Voice, and why getting turn management right matters more than raw speed.
The CPCT Standard: Why Cost-Per-Token is a Vanity Metric
Cost-per-token is the new "hits per second"—a vanity metric that obfuscates business health. The "cheap" model that fails 50% of the time costs 3.75x more than the premium alternative.
The Top 100 AI Agent Companies: A Strategic Directory
The definitive directory of 100 AI agent companies. Three tiers: Foundational platforms, Integration partners, and Vertical specialists for enterprise automation.
The Agent Ecosystem Map: A Buyer's Guide to Vendor Selection
The $7.6B agent market in three tiers: Foundational (Microsoft, Google), Orchestration (Kore.ai, Airia), and Vertical (Harvey, Devin). Vendor evaluation guide.
Agent Economics: The Unit Economics of Autonomous Work
Stop measuring cost per token. The metric that matters is Cost Per Completed Task. Here is the framework for measuring, optimizing, and governing the economics of AI agents.
The Self-Healing Agent: How AI Systems Learn to Fix Themselves
Static prompts in dynamic environments lead to performance decay. Here is the architecture for building agents that automatically analyze their failures and optimize themselves.
The Orchestration Decision: LangGraph vs AutoGen
Choosing the wrong agent framework costs months. LangGraph excels at production determinism. AutoGen excels at rapid prototyping. Here is when to use each - and why the answer is often both.
Agent Observability: Monitoring AI Systems in Production
Evaluation ends at deployment. Observability begins. Distributed tracing, guardrails, and the monitoring stack that keeps production agents reliable.
The Agent Operations Playbook: SRE for AI Systems
Traditional SRE fails with non-deterministic systems. Here are the SLAs, incident response patterns, and deployment strategies that work for production AI agents.
The Agent Safety Stack: Defense-in-Depth for Autonomous AI
Agents that take actions have different risk profiles than chatbots. Here is the defense-in-depth architecture: prompt injection defense, red teaming, kill switches, and guardrail benchmarks.
The Agent Scorecard: Translating Technical Metrics to Business ROI
Engineers track latency and tokens. Executives want ROI. Here is the framework for translating agent performance into board-ready business metrics.
Why Legal AI Breaks Every Rule About Agent Adoption
In every vertical, small companies deploy AI faster than enterprises. Legal is the exception. Content moats and liability costs invert the landscape.
The State of Legal AI: When Research Takes Minutes and Arguments Write Themselves
Legal AI evolved from search engines to autonomous research partners. CoCounsel, Harvey, and the new wave are rebuilding the profession.
Why Small Companies Win the AI Agent Race
Large enterprises have 3x-9x slower AI deployment cycles than SMBs. The culprit is not culture - it is structural friction that can be quantified and overcome.
The Hallucination Tax: Calculating the True Cost of AI Errors
Every AI hallucination has a cost—lost trust, wasted time, incorrect decisions. Here's how to calculate yours and the architecture that minimizes it.
Swarm Patterns: When Agents Learn to Collaborate
Single agents hit ceilings. Multi-agent swarms break through them. Here are the coordination patterns separating toy demos from production systems.
The 5 Agent Failure Modes (And How to Prevent Them)
Most AI agents fail silently in production. Here are the five failure modes killing your deployments—and the architecture patterns that prevent them.
The Prompt DNA Hypothesis: Evolving Agent Instructions
What if we treated prompts like genetic code—subject to mutation, selection, and evolution? The best agent prompts aren't written. They're bred.
The Autonomous Revolution: AI Agents Rewriting Work
The workforce is evolving—literally. AI agents are no longer experimental tools but genetically optimized systems driving 50%+ of enterprise operations autonomously.
How to Know If Your AI Agent Actually Works
Model benchmarks tell you nothing about agent performance. Trajectory analysis, the three evaluation pillars, and the metrics that actually matter.
LLM-as-Judge: The $5,000 Question for $10
When to use LLMs to evaluate LLMs—and when not to. The biases, the economics, the production patterns, and the decision framework for automated evaluation.
The $100 Task: How Production Teams Cut Agent Costs by 10x
Where tokens actually go in agent workflows, and the caching, routing, and architectural patterns that reduce costs by an order of magnitude.
Agent Billing: Why Crypto Finally Makes Sense
The hardest unsolved problem in agent economics. Blockchain presents the first legitimate enterprise use case: micropayments, escrow, and disputes.
Temporal: The Durable Execution Engine for AI Agents
Technical deep dive into Temporal for agent orchestration. Why Netflix runs 100K+ workflows/day on it, and how to build production agents with durable execution.
The Turn-Taking Problem: Why Voice AI Still Feels Robotic
The engineering behind making machines talk in conversation—beyond TTS quality to the temporal dynamics that make or break natural voice interaction.
Orchestration Showdown: Graphs vs Conversations vs Roles vs Raw Loops
LangGraph, AutoGen, CrewAI, or build your own? The architectural philosophies behind agent orchestration frameworks—and which mental model fits your problem.
The TCP/IP of Agents: How Machines Will Talk to Machines
We're at the protocol wars moment for agent communication. The standards we design now will shape whether agents remain isolated tools or become distributed intelligence.
The Context Crisis: What to Do When Your Agent Runs Out of Room
Beyond RAG—the physics, strategies, and production patterns for managing context when 200K tokens still isn't enough.