Build an AI Agent from Scratch in 80 Lines of Python

What is an AI Agent?

An AI agent is a program that calls an LLM in a loop, using tools to take actions until a task is complete. Unlike a chatbot (one question, one answer), an agent reasons about what to do next, executes actions, observes results, and repeats—autonomously working toward a goal.

Core Primitives

Loop + Tools + Memory + Termination

That's the entire abstraction. Everything else—LangChain, LangGraph, CrewAI, AutoGen—builds on these four primitives. In this tutorial, we build all four from scratch in ~80 lines of Python.

What We're Building

We're going to build a working AI agent with zero frameworks. No dependencies beyond the Anthropic SDK—just a while loop and an API call. By the end, you'll have an agent that reasons about what to do, acts via tools, remembers prior context, and knows when to stop.

The code uses Claude, but the pattern works identically with OpenAI, Gemini, or any model supporting tool calling. Swap out the API client and you're done.

Part 1: The Agent Loop

An agent is a loop that calls an LLM until some condition tells it to stop. Here's the skeleton:

import anthropic
 
client = anthropic.Anthropic()
 
def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
 
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages
        )
 
        # For now, just return the first response
        return response.content[0].text

This isn't an agent yet—it's a single LLM call wrapped in a function. But it establishes the skeleton: a loop, a message list, and an API call.

Every decision in the next 80 lines shapes what this agent can do. The loop structure is the foundation—get it wrong and you'll be fighting it forever. See The Agent Thesis for why architecture matters more than model choice.

The loop doesn't actually loop yet. Let's fix that by adding tools.

Part 2: Tools Let the Agent Act

Without tools, an LLM can only generate text. With tools, it can act on the world—call APIs, query databases, run calculations. This is what makes an agent an agent rather than a chatbot.

Let's add a calculator. First, the Python function:

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression)  # Don't do this in production!
        return str(result)
    except Exception as e:
        return f"Error: {e}"

Now we tell the model this tool exists. The schema is the same across Claude, GPT, and Gemini—a name, a description, and a JSON schema for the inputs:

tools = [
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. Use this for any math.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The math expression to evaluate, e.g. '2 + 2' or '(10 * 5) / 2'"
                }
            },
            "required": ["expression"]
        }
    }
]

The model reads the description to decide when to call the tool. Vague descriptions cause Tool Amnesia—the agent forgets it has tools, or picks the wrong one. Write descriptions like you're onboarding a new teammate.

Now the agent loop needs to handle tool calls:

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
 
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages,
            tools=tools
        )
 
        # Check if the model wants to use a tool
        if response.stop_reason == "tool_use":
            # Find the tool call in the response
            tool_use = next(
                block for block in response.content
                if block.type == "tool_use"
            )
 
            # Execute the tool
            if tool_use.name == "calculate":
                result = calculate(tool_use.input["expression"])
            else:
                result = f"Unknown tool: {tool_use.name}"
 
            # Add the assistant's response and tool result to messages
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": result
                }]
            })
 
            # Continue the loop—let the model process the result
            continue
 
        # No tool call, return the text response
        return response.content[0].text

Run this with "What's 1847 * 23?" and trace what happens:

Model receives question, decides to call calculate
Our code runs calculate("1847 * 23") → "42481"
We append the result to messages and call the model again
Model sees the result and generates: "1847 × 23 = 42,481"

The loop ran twice. Reason → act → observe → respond. That's the core pattern.

Part 3: Memory Is Just a List

Notice what we're already doing: appending to messages. That list is the agent's memory. Every turn—user, assistant, tool result—accumulates in the same array:

messages = [
    {"role": "user", "content": "What's 10 + 5?"},
    {"role": "assistant", "content": [...]},  # tool_use block
    {"role": "user", "content": [...]},        # tool_result
    {"role": "assistant", "content": "10 + 5 = 15"},
    {"role": "user", "content": "Now multiply that by 3"},
    # The model can reference "that" because it sees the history
]

This naive approach—just keep appending—works for short conversations. The model sees everything and can reference prior turns.

Context windows have limits. Claude caps at 200K tokens, GPT-4o at 128K. An agent running for an hour can accumulate 500+ tool calls. At some point the API rejects the request—or worse, the model silently loses information from the middle of context.

For our toy agent, we keep the naive approach. Production agents need smarter memory: sliding windows, summarization, or external retrieval. See Agent Memory Architecture for the patterns.

Part 4: Explicit Termination

Our agent loops until the model stops calling tools. But what if the model gets stuck? What if it keeps calling tools forever?

This is Infinite Loop Syndrome—one of five ways agents die in production. The fix: explicit termination via a finish tool:

tools = [
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression"}
            },
            "required": ["expression"]
        }
    },
    {
        "name": "finish",
        "description": "Call this when the task is complete and you have the final answer.",
        "input_schema": {
            "type": "object",
            "properties": {
                "answer": {"type": "string", "description": "The final answer to return"}
            },
            "required": ["answer"]
        }
    }
]

Now update the loop:

def run_agent(user_message: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]
 
    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages,
            tools=tools
        )
 
        if response.stop_reason == "tool_use":
            tool_use = next(
                block for block in response.content
                if block.type == "tool_use"
            )
 
            # Check for explicit termination
            if tool_use.name == "finish":
                return tool_use.input["answer"]
 
            # Execute other tools
            if tool_use.name == "calculate":
                result = calculate(tool_use.input["expression"])
            else:
                result = f"Unknown tool: {tool_use.name}"
 
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": result
                }]
            })
            continue
 
        # No tool call—return text response
        return response.content[0].text
 
    return "Max iterations reached without completion"

Two safety mechanisms:

max_iterations caps the loop (hard limit)
finish tool provides explicit exit (clean termination)

Belt and suspenders. The agent knows when to stop.

Part 5: The Complete Agent

Here's the full implementation—~80 lines, zero frameworks:

import anthropic
from typing import Any
 
client = anthropic.Anthropic()
 
# --- Tools ---
 
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        # WARNING: eval is dangerous. Use a proper math parser in production.
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Error: {e}"
 
tools = [
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. Use for any math.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression, e.g. '2 + 2' or '(10 * 5) / 2'"
                }
            },
            "required": ["expression"]
        }
    },
    {
        "name": "finish",
        "description": "Call when the task is complete with the final answer.",
        "input_schema": {
            "type": "object",
            "properties": {
                "answer": {"type": "string", "description": "Final answer"}
            },
            "required": ["answer"]
        }
    }
]
 
def execute_tool(name: str, inputs: dict[str, Any]) -> str:
    """Dispatch tool calls to their implementations."""
    if name == "calculate":
        return calculate(inputs["expression"])
    return f"Unknown tool: {name}"
 
# --- Agent Loop ---
 
def run_agent(user_message: str, max_iterations: int = 10) -> str:
    """Run the agent loop until completion or max iterations."""
    messages = [{"role": "user", "content": user_message}]
 
    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages,
            tools=tools
        )
 
        # Handle tool use
        if response.stop_reason == "tool_use":
            tool_use = next(
                b for b in response.content if b.type == "tool_use"
            )
 
            # Explicit termination
            if tool_use.name == "finish":
                return tool_use.input["answer"]
 
            # Execute and continue
            result = execute_tool(tool_use.name, tool_use.input)
 
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": result
                }]
            })
            continue
 
        # No tool call—return text
        return response.content[0].text
 
    return "Max iterations reached"
 
# --- Run it ---
 
if __name__ == "__main__":
    result = run_agent("What is (1847 * 23) + (512 / 4)?")
    print(result)

Run it:

$ python agent.py
42609.0

The agent:

Received the math question
Called calculate for the first expression
Called calculate for the second expression
Called finish with the combined result

Three iterations, explicit termination, correct answer. That's an agent.

Part 6: Where the Toy Breaks Down

This agent teaches the core loop. Ship it to production and you'll learn why frameworks exist the hard way.

What's missing:

Error handling. What if the API call fails? What if the tool throws? Our agent crashes. Production agents need try/catch, retries, and graceful degradation.
State persistence. If the process dies, the agent's memory dies with it. The Durable Agent covers how to persist state to disk so agents can survive crashes and restarts.
Structured orchestration. Our while loop is linear. Real workflows have branches, parallel paths, and cycles. The Graph Mandate explains why chat-based loops fail and state machines win.
Observability. When something goes wrong, how do you debug it? You need to log every decision, every tool call, every response. The Probabilistic Stack covers engineering for non-deterministic systems.
Confidence thresholds. Should every tool call execute automatically? Or should some pause for human approval? The HITL Firewall covers when to insert humans in the loop.

When you need these capabilities, frameworks like LangGraph, Temporal, and Inngest exist. They handle the infrastructure so you can focus on the logic. But now you know what they're abstracting: a loop, a message list, tool dispatch, and termination conditions.

What We Built

An agent is four things:

A loop that calls an LLM repeatedly
Tools that let the LLM take actions
Memory (just a list of messages)
Termination (explicit "finish" or iteration cap)

That's the whole abstraction. Everything else—orchestration frameworks, memory systems, observability platforms—builds on these primitives.

Now run the code. Modify it. Add a tool that fetches weather data or queries a database. Break it and see what happens. The best way to understand agents is to build one yourself, watch it fail, and fix it.

When you're ready for production, frameworks like LangGraph and Temporal are waiting. But you'll approach them differently now—not as magic boxes, but as sophisticated implementations of the loop you just wrote.

Synthesis12 min

The Agent Thesis: What We Know After 100 Deployments

A synthesis of the patterns that separate agents that ship from agents that die in pilot purgatory. The throughlines across architecture, operations, economics, and security.

Read

Best Practices5 min

The 5 Agent Failure Modes (And How to Prevent Them)

Most AI agents fail silently in production. Here are the five failure modes killing your deployments—and the architecture patterns that prevent them.

Read

Technical Deep Dive7 min

The Durable Agent: Why Infrastructure Beats Prompts

A 15-minute task that crashes at 99% wastes $4.50 in compute. Temporal eliminates the Restart Tax and turns debugging into DVR replay.

Read

Technical Deep Dive8 min

The Graph Mandate: Why Chat-Based Agents Fail in Production

The "Chat Loop" is the "goto" statement of the AI era. 70-90% of enterprise AI projects stall in Pilot Purgatory. Graph architectures are the path to production.

Read

Build an AI Agent from Scratch: The 80-Line Implementation

What is an AI Agent?

What We're Building

Part 1: The Agent Loop

Part 2: Tools Let the Agent Act

Part 3: Memory Is Just a List

Part 4: Explicit Termination

Part 5: The Complete Agent

Part 6: Where the Toy Breaks Down

What We Built

The Agent Thesis: What We Know After 100 Deployments

The 5 Agent Failure Modes (And How to Prevent Them)

The Durable Agent: Why Infrastructure Beats Prompts

The Graph Mandate: Why Chat-Based Agents Fail in Production

Related

Ask a follow-up