AI-Assisted Engineering Playbook: The Verification Loop

The Codebase That Felt Like 10 Developers Who Never Talked

A developer shipped a feature under deadline pressure. Heavy AI generation, light review. The result: 47 duplicate utility functions. Three different date formatting libraries. Authentication logic scattered across 12 files with subtly different implementations. A formatUserName function that existed in four variations, each handling edge cases differently.

The refactor took longer than building it correctly would have.

This isn't a failure of AI. It's a failure of process. The same chaos occurs when you give junior developers too much autonomy and too little structure. AI just produces chaos faster.

The Core Problem: AI writes code with complete conviction. It doesn't know what it doesn't know. It can't see the duplicate function three files over. It optimizes for the immediate prompt, not the codebase.

This playbook introduces The Verification Loop—a unified framework that prevents AI-generated chaos while preserving the speed benefits. It synthesizes hard-won patterns from Addy Osmani, Simon Willison, and Les Orchard into a system you can apply today.

When Vibe Coding Is Exactly Right

Before we discuss discipline, let's acknowledge: vibe coding has its place.

Feature	Context	Approach	Why It Works
Prototype	Throwaway prototype	Vibe code freely	Speed matters, correctness doesn't
Learning	Learning new language/framework	Vibe code, then study output	AI becomes a tutor
Scripts	One-off scripts	Vibe code if it runs	No maintenance burden
Exploration	Exploring solution space	Vibe code multiple approaches	Divergent thinking phase
Production	Production code	Verification Loop	Maintenance cost is real

The question isn't "vibe coding or discipline?" It's "what are the consequences of getting this wrong?"

If the code will be read by others, maintained over time, or run in production—you need the Verification Loop. If you're exploring, learning, or building something disposable—vibe away.

The Verification Loop

Every practice in this playbook serves a single purpose: preventing the AI from drifting into chaos while you're not watching.

The loop has five phases:

The Loop Is Recursive: After COMMIT, return to CHUNK for the next piece. After completing a feature, return to PLAN for the next feature. The loop runs at multiple timescales.

The rest of this playbook unpacks each phase.

PLAN: The 15-Minute Waterfall

The most common mistake is prompting "Build me a user auth system" and accepting whatever comes back. You get a solution—rarely the right solution.

Les Orchard calls the alternative "waterfall in 15 minutes." It's a rapid but structured planning phase:

Step 1: Brainstorm with the AI. Describe your goal. Ask the AI to "iteratively ask me questions until we've covered requirements and edge cases." This surfaces assumptions you didn't know you had. What happens when the token expires? What if the user has multiple sessions? Do we need password reset?

Step 2: Compile into spec.md. Document requirements, architecture decisions, data models, and testing strategy. This file becomes the source of truth. When you and the AI disagree later about what "correct" means, spec.md decides.

Step 3: Generate a task plan. Feed the spec to a reasoning-optimized model. Ask it to break implementation into logical, bite-sized tasks. A good plan has 5-15 tasks, each completeable in one AI session.

The Cost of Skipping: Without a spec, the AI optimizes for each prompt in isolation. You end up with auth logic that works but doesn't integrate with your session model. Or a database schema that can't handle the edge cases you forgot to mention. The refactor costs more than planning would have.

This maps directly to the agent operations playbook: define success criteria before execution, not after. When both human and AI know exactly what they're building, wasted cycles disappear.

PACK: Context Is Everything

LLMs are only as good as the context you provide. The single highest-leverage skill in AI-assisted engineering is context packing—systematically feeding the AI everything it needs to perform well.

What to pack:

Project structure: Let the AI see how your codebase is organized
Coding standards: Style guides, lint rules, naming conventions
Relevant files: The modules this change will touch
Documentation: API docs for libraries you're using
Anti-patterns: Explicit warnings about approaches to avoid

Modern tools automate context packaging:

Feature	Tool	Function	Best For
gitingest	gitingest	Dumps codebase to text file for LLM ingestion	Full project context
repo2txt	repo2txt	Converts repo structure to LLM-readable format	Architecture understanding
Context7 MCP	Context7	Live MCP server for documentation retrieval	API/library lookups
CLAUDE.md	CLAUDE.md	Project rules file loaded automatically	Persistent project context

The CLAUDE.md pattern deserves special attention. This file contains process rules and preferences that Claude loads at session start: coding style, lint rules, forbidden patterns, preferred approaches. It's like onboarding a new hire—give them the style guide upfront. See Claude Code Superuser for the full pattern.

The Principle: Don't make the AI operate on partial information. If a bug fix requires understanding four modules, show it those four modules. If you're using a niche library, paste in the docs. Context is currency.

This is the practical application of context engineering at the individual developer level.

CHUNK: The Anti-Chaos Rule

Never ask the AI for large, monolithic outputs. The failure mode is predictable: the model gets confused, produces a "jumbled mess" that's impossible to untangle, and you've wasted an hour of context budget.

The rule: One function, one bug, one feature per request.

The workflow:

Prompt: "Let's implement Step 1 from the plan"
AI generates code
You review the output
You run tests
If it works: commit
If it fails: debug with the AI, then commit
Move to Step 2

Each iteration carries forward the context of what's been built. Each chunk is small enough that you can read every line. Errors have nowhere to hide.

The Symptoms of Over-Chunking: You asked for too much if: (1) the output is >200 lines, (2) you can't immediately understand what changed, (3) multiple unrelated files are modified, or (4) the AI starts contradicting its earlier output.

Claude Code Skills formalize this into modular, reusable workflows. Instead of asking for "a complete feature," you invoke a skill that knows how to execute one step correctly.

VERIFY: Trust Nothing

AI produces plausible-looking code with hidden bugs. The model writes with complete conviction—including when it's wrong. It doesn't raise its hand when uncertain. It just writes.

The cardinal rule: Treat every AI-generated snippet as if it came from an over-confident junior developer. Read it. Run it. Test it.

For debugging, tools like Chrome DevTools MCP bridge static analysis and live execution—giving the AI direct access to console logs, network traces, and DOM inspection. Bugs get diagnosed with actual runtime data, not guesswork.

The same observability principles that apply to production agents apply to your development loop. See Building Agent Evals for systematic verification patterns and LLM-as-Judge for using AI to grade AI.

COMMIT: Save Points, Not Ceremonies

When AI generates code quickly, it's easy for things to veer off course. Mitigate this with ultra-granular version control.

Commit after every successful chunk. Clear message. Move on.

The Save Point Analogy: Treat commits as save points in a game. If an LLM session goes sideways, you can roll back to the last stable commit without losing hours of work.

This isn't ceremonial—it's tactical:

When context windows overflow, git history becomes your memory. Paste diffs to brief the AI on what changed.
When bugs appear, git bisect (which LLMs are excellent at) finds where they were introduced.
When experiments fail, you throw away the branch, not the day.

Advanced pattern: Use git worktrees to isolate AI experiments. Spin up a fresh worktree for a new feature, let the AI run in its sandbox, merge if successful, throw away if not.

Model Selection: Match Tool to Task

Different models have different strengths. Part of production-grade workflow is choosing the right model for each phase—and switching when one gets stuck.

The principle: Use reasoning-optimized models for planning and spec generation. Use coding-optimized models for implementation. Use fast models for iteration and review.

The Model Musical Chairs Pattern: If one model gets stuck or gives mediocre outputs, copy the same prompt to another model. Different models have different blind spots. What Claude can't crack, Gemini might nail—and vice versa.

Model capabilities change rapidly. What matters is the pattern: match model strengths to task requirements, and don't hesitate to switch when you're not getting results.

The Amplification Mechanism

"AI amplifies expertise" is a truism. Here's the specific mechanism:

What senior engineers do well:

Decompose problems into solvable pieces
Recognize when a solution doesn't fit the architecture
Spot edge cases before they become bugs
Know which corners can be cut and which can't

What AI does well:

Generate boilerplate at speed
Remember syntax and API details
Produce multiple solution variants quickly
Explain unfamiliar code

Leverage Point

10x

Seniors directing AI, not replaced by it

The leverage comes from combining these. A senior engineer directing AI moves faster than either alone. A junior engineer without the decomposition and review skills gets AI-generated chaos—faster.

This is why the two-pizza agent team model works: small teams of senior engineers with AI assistance outproduce large teams of juniors. The bottleneck isn't typing speed. It's judgment.

For Juniors: AI can accelerate your learning—but only if you stay in the loop. Ask the AI to explain its code. Debug its mistakes. Understand why something works, not just that it works. Use AI as a tutor, not an oracle.

The Decision Tree

Not every project needs the full Verification Loop. Here's when to apply what:

Apply Full Loop:

Production code
Shared codebases
Features you'll maintain
Anything with security implications

Apply PACK + CHUNK + VERIFY (skip formal PLAN):

Bug fixes in familiar code
Small features with clear scope
Refactoring with tests

Apply CHUNK + VERIFY only:

Exploratory coding you might throw away
Learning new frameworks
Proof of concepts

Vibe Code Freely:

Throwaway scripts
Personal experiments
"Let me see if this is even possible"

The cost of discipline is time. The cost of chaos is more time. Choose based on consequences.

The Discipline Delta

The gap between hobbyist and professional AI-assisted engineering isn't the model you use. It's not the tokens you spend. It's the process around the AI.

The Verification Loop is that process:

PLAN prevents building the wrong thing
PACK prevents hallucinated solutions that don't fit
CHUNK prevents chaos hiding in large outputs
VERIFY catches confident mistakes
COMMIT enables recovery when things go wrong

None of this is optional for production work. All of it is optional for throwaway code—which is why vibe coding feels magical in demos and falls apart in production.

The AI is an exceptional force multiplier. But the human engineer remains the architect. That's not a limitation—it's the architecture.

Start Here: Pick one phase of the loop you're currently skipping. Implement it for one week. Measure the difference in rework and debugging time.

See also: Claude Code Superuser for advanced Claude Code workflows, Claude Code Skills for modular skill architecture, and The Probabilistic Stack for engineering non-deterministic systems.

Technical Deep Dive14 min

The Claude Code Superuser Guide: From Developer to Agent Orchestrator

How to master Claude Code by shifting from writing code to orchestrating AI agents. Parallel development, context mastery, and the workflows that unlock 10x productivity.

Read

Agent Engineering8 min

The Architect's Guide to Engineering Claude Code Skills

A comprehensive manual for process engineering, context economics, and agent specialization. Learn how to transform Claude Code from a generalist into a specialized agent through modular skills.

Read

Technical Deep Dive10 min

The Probabilistic Stack: Engineering for Non-Determinism

LLMs break the fundamental assumption of software engineering: deterministic inputs produce deterministic outputs. New patterns required.

Read

Operations9 min

The Agent Operations Playbook: SRE for AI Systems

Traditional SRE fails with non-deterministic systems. Here are the SLAs, incident response patterns, and deployment strategies that work for production AI agents.

Read

The AI-Assisted Engineering Playbook: From Vibe Coding to Production-Grade

The Codebase That Felt Like 10 Developers Who Never Talked

When Vibe Coding Is Exactly Right

The Verification Loop

PLAN: The 15-Minute Waterfall

PACK: Context Is Everything

CHUNK: The Anti-Chaos Rule

VERIFY: Trust Nothing

COMMIT: Save Points, Not Ceremonies

Model Selection: Match Tool to Task

The Amplification Mechanism

The Decision Tree

The Discipline Delta

The Claude Code Superuser Guide: From Developer to Agent Orchestrator

The Architect's Guide to Engineering Claude Code Skills

The Probabilistic Stack: Engineering for Non-Determinism

The Agent Operations Playbook: SRE for AI Systems

Related

Ask a follow-up