The Codebase That Felt Like 10 Developers Who Never Talked
A developer shipped a feature under deadline pressure. Heavy AI generation, light review. The result: 47 duplicate utility functions. Three different date formatting libraries. Authentication logic scattered across 12 files with subtly different implementations. A formatUserName function that existed in four variations, each handling edge cases differently.
The refactor took longer than building it correctly would have.
This isn't a failure of AI. It's a failure of process. The same chaos occurs when you give junior developers too much autonomy and too little structure. AI just produces chaos faster.
The Core Problem: AI writes code with complete conviction. It doesn't know what it doesn't know. It can't see the duplicate function three files over. It optimizes for the immediate prompt, not the codebase.
This playbook introduces The Verification Loop—a unified framework that prevents AI-generated chaos while preserving the speed benefits. It synthesizes hard-won patterns from Addy Osmani, Simon Willison, and Les Orchard into a system you can apply today.
When Vibe Coding Is Exactly Right
Before we discuss discipline, let's acknowledge: vibe coding has its place.
| Feature | Context | Approach | Why It Works |
|---|---|---|---|
| Prototype | Throwaway prototype | Vibe code freely | Speed matters, correctness doesn't |
| Learning | Learning new language/framework | Vibe code, then study output | AI becomes a tutor |
| Scripts | One-off scripts | Vibe code if it runs | No maintenance burden |
| Exploration | Exploring solution space | Vibe code multiple approaches | Divergent thinking phase |
| Production | Production code | Verification Loop | Maintenance cost is real |
The question isn't "vibe coding or discipline?" It's "what are the consequences of getting this wrong?"
If the code will be read by others, maintained over time, or run in production—you need the Verification Loop. If you're exploring, learning, or building something disposable—vibe away.
The Verification Loop
Every practice in this playbook serves a single purpose: preventing the AI from drifting into chaos while you're not watching.
The loop has five phases:
The Loop Is Recursive: After COMMIT, return to CHUNK for the next piece. After completing a feature, return to PLAN for the next feature. The loop runs at multiple timescales.
The rest of this playbook unpacks each phase.
PLAN: The 15-Minute Waterfall
The most common mistake is prompting "Build me a user auth system" and accepting whatever comes back. You get a solution—rarely the right solution.
Les Orchard calls the alternative "waterfall in 15 minutes." It's a rapid but structured planning phase:
Step 1: Brainstorm with the AI. Describe your goal. Ask the AI to "iteratively ask me questions until we've covered requirements and edge cases." This surfaces assumptions you didn't know you had. What happens when the token expires? What if the user has multiple sessions? Do we need password reset?
Step 2: Compile into spec.md. Document requirements, architecture decisions, data models, and testing strategy. This file becomes the source of truth. When you and the AI disagree later about what "correct" means, spec.md decides.
Step 3: Generate a task plan. Feed the spec to a reasoning-optimized model. Ask it to break implementation into logical, bite-sized tasks. A good plan has 5-15 tasks, each completeable in one AI session.
The Cost of Skipping: Without a spec, the AI optimizes for each prompt in isolation. You end up with auth logic that works but doesn't integrate with your session model. Or a database schema that can't handle the edge cases you forgot to mention. The refactor costs more than planning would have.
This maps directly to the agent operations playbook: define success criteria before execution, not after. When both human and AI know exactly what they're building, wasted cycles disappear.
PACK: Context Is Everything
LLMs are only as good as the context you provide. The single highest-leverage skill in AI-assisted engineering is context packing—systematically feeding the AI everything it needs to perform well.
What to pack:
- Project structure: Let the AI see how your codebase is organized
- Coding standards: Style guides, lint rules, naming conventions
- Relevant files: The modules this change will touch
- Documentation: API docs for libraries you're using
- Anti-patterns: Explicit warnings about approaches to avoid
Modern tools automate context packaging:
| Feature | Tool | Function | Best For |
|---|---|---|---|
| gitingest | gitingest | Dumps codebase to text file for LLM ingestion | Full project context |
| repo2txt | repo2txt | Converts repo structure to LLM-readable format | Architecture understanding |
| Context7 MCP | Context7 | Live MCP server for documentation retrieval | API/library lookups |
| CLAUDE.md | CLAUDE.md | Project rules file loaded automatically | Persistent project context |
The CLAUDE.md pattern deserves special attention. This file contains process rules and preferences that Claude loads at session start: coding style, lint rules, forbidden patterns, preferred approaches. It's like onboarding a new hire—give them the style guide upfront. See Claude Code Superuser for the full pattern.
The Principle: Don't make the AI operate on partial information. If a bug fix requires understanding four modules, show it those four modules. If you're using a niche library, paste in the docs. Context is currency.
This is the practical application of context engineering at the individual developer level.
CHUNK: The Anti-Chaos Rule
Never ask the AI for large, monolithic outputs. The failure mode is predictable: the model gets confused, produces a "jumbled mess" that's impossible to untangle, and you've wasted an hour of context budget.
The rule: One function, one bug, one feature per request.
The workflow:
- Prompt: "Let's implement Step 1 from the plan"
- AI generates code
- You review the output
- You run tests
- If it works: commit
- If it fails: debug with the AI, then commit
- Move to Step 2
Each iteration carries forward the context of what's been built. Each chunk is small enough that you can read every line. Errors have nowhere to hide.
The Symptoms of Over-Chunking: You asked for too much if: (1) the output is >200 lines, (2) you can't immediately understand what changed, (3) multiple unrelated files are modified, or (4) the AI starts contradicting its earlier output.
Claude Code Skills formalize this into modular, reusable workflows. Instead of asking for "a complete feature," you invoke a skill that knows how to execute one step correctly.
VERIFY: Trust Nothing
AI produces plausible-looking code with hidden bugs. The model writes with complete conviction—including when it's wrong. It doesn't raise its hand when uncertain. It just writes.
The cardinal rule: Treat every AI-generated snippet as if it came from an over-confident junior developer. Read it. Run it. Test it.
For debugging, tools like Chrome DevTools MCP bridge static analysis and live execution—giving the AI direct access to console logs, network traces, and DOM inspection. Bugs get diagnosed with actual runtime data, not guesswork.
The same observability principles that apply to production agents apply to your development loop. See Building Agent Evals for systematic verification patterns and LLM-as-Judge for using AI to grade AI.
COMMIT: Save Points, Not Ceremonies
When AI generates code quickly, it's easy for things to veer off course. Mitigate this with ultra-granular version control.
Commit after every successful chunk. Clear message. Move on.
The Save Point Analogy: Treat commits as save points in a game. If an LLM session goes sideways, you can roll back to the last stable commit without losing hours of work.
This isn't ceremonial—it's tactical:
- When context windows overflow, git history becomes your memory. Paste diffs to brief the AI on what changed.
- When bugs appear, git bisect (which LLMs are excellent at) finds where they were introduced.
- When experiments fail, you throw away the branch, not the day.
Advanced pattern: Use git worktrees to isolate AI experiments. Spin up a fresh worktree for a new feature, let the AI run in its sandbox, merge if successful, throw away if not.
Model Selection: Match Tool to Task
Different models have different strengths. Part of production-grade workflow is choosing the right model for each phase—and switching when one gets stuck.
The principle: Use reasoning-optimized models for planning and spec generation. Use coding-optimized models for implementation. Use fast models for iteration and review.
The Model Musical Chairs Pattern: If one model gets stuck or gives mediocre outputs, copy the same prompt to another model. Different models have different blind spots. What Claude can't crack, Gemini might nail—and vice versa.
Model capabilities change rapidly. What matters is the pattern: match model strengths to task requirements, and don't hesitate to switch when you're not getting results.
The Amplification Mechanism
"AI amplifies expertise" is a truism. Here's the specific mechanism:
What senior engineers do well:
- Decompose problems into solvable pieces
- Recognize when a solution doesn't fit the architecture
- Spot edge cases before they become bugs
- Know which corners can be cut and which can't
What AI does well:
- Generate boilerplate at speed
- Remember syntax and API details
- Produce multiple solution variants quickly
- Explain unfamiliar code
Leverage Point
10x
Seniors directing AI, not replaced by it
The leverage comes from combining these. A senior engineer directing AI moves faster than either alone. A junior engineer without the decomposition and review skills gets AI-generated chaos—faster.
This is why the two-pizza agent team model works: small teams of senior engineers with AI assistance outproduce large teams of juniors. The bottleneck isn't typing speed. It's judgment.
For Juniors: AI can accelerate your learning—but only if you stay in the loop. Ask the AI to explain its code. Debug its mistakes. Understand why something works, not just that it works. Use AI as a tutor, not an oracle.
The Decision Tree
Not every project needs the full Verification Loop. Here's when to apply what:
Apply Full Loop:
- Production code
- Shared codebases
- Features you'll maintain
- Anything with security implications
Apply PACK + CHUNK + VERIFY (skip formal PLAN):
- Bug fixes in familiar code
- Small features with clear scope
- Refactoring with tests
Apply CHUNK + VERIFY only:
- Exploratory coding you might throw away
- Learning new frameworks
- Proof of concepts
Vibe Code Freely:
- Throwaway scripts
- Personal experiments
- "Let me see if this is even possible"
The cost of discipline is time. The cost of chaos is more time. Choose based on consequences.
The Discipline Delta
The gap between hobbyist and professional AI-assisted engineering isn't the model you use. It's not the tokens you spend. It's the process around the AI.
The Verification Loop is that process:
- PLAN prevents building the wrong thing
- PACK prevents hallucinated solutions that don't fit
- CHUNK prevents chaos hiding in large outputs
- VERIFY catches confident mistakes
- COMMIT enables recovery when things go wrong
None of this is optional for production work. All of it is optional for throwaway code—which is why vibe coding feels magical in demos and falls apart in production.
The AI is an exceptional force multiplier. But the human engineer remains the architect. That's not a limitation—it's the architecture.
Start Here: Pick one phase of the loop you're currently skipping. Implement it for one week. Measure the difference in rework and debugging time.
See also: Claude Code Superuser for advanced Claude Code workflows, Claude Code Skills for modular skill architecture, and The Probabilistic Stack for engineering non-deterministic systems.
The Claude Code Superuser Guide: From Developer to Agent Orchestrator
How to master Claude Code by shifting from writing code to orchestrating AI agents. Parallel development, context mastery, and the workflows that unlock 10x productivity.
The Architect's Guide to Engineering Claude Code Skills
A comprehensive manual for process engineering, context economics, and agent specialization. Learn how to transform Claude Code from a generalist into a specialized agent through modular skills.
The Probabilistic Stack: Engineering for Non-Determinism
LLMs break the fundamental assumption of software engineering: deterministic inputs produce deterministic outputs. New patterns required.
The Agent Operations Playbook: SRE for AI Systems
Traditional SRE fails with non-deterministic systems. Here are the SLAs, incident response patterns, and deployment strategies that work for production AI agents.