What is an AI Agent?
An AI agent is a program that calls an LLM in a loop, using tools to take actions until a task is complete. Unlike a chatbot (one question, one answer), an agent reasons about what to do next, executes actions, observes results, and repeats—autonomously working toward a goal.
Core Primitives
4
Loop + Tools + Memory + Termination
That's the entire abstraction. Everything else—LangChain, LangGraph, CrewAI, AutoGen—builds on these four primitives. In this tutorial, we build all four from scratch in ~80 lines of Python.
What We're Building
We're going to build a working AI agent with zero frameworks. No dependencies beyond the Anthropic SDK—just a while loop and an API call. By the end, you'll have an agent that reasons about what to do, acts via tools, remembers prior context, and knows when to stop.
The code uses Claude, but the pattern works identically with OpenAI, Gemini, or any model supporting tool calling. Swap out the API client and you're done.
Part 1: The Agent Loop
An agent is a loop that calls an LLM until some condition tells it to stop. Here's the skeleton:
import anthropic
client = anthropic.Anthropic()
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages
)
# For now, just return the first response
return response.content[0].textThis isn't an agent yet—it's a single LLM call wrapped in a function. But it establishes the skeleton: a loop, a message list, and an API call.
Every decision in the next 80 lines shapes what this agent can do. The loop structure is the foundation—get it wrong and you'll be fighting it forever. See The Agent Thesis for why architecture matters more than model choice.
The loop doesn't actually loop yet. Let's fix that by adding tools.
Part 2: Tools Let the Agent Act
Without tools, an LLM can only generate text. With tools, it can act on the world—call APIs, query databases, run calculations. This is what makes an agent an agent rather than a chatbot.
Let's add a calculator. First, the Python function:
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
result = eval(expression) # Don't do this in production!
return str(result)
except Exception as e:
return f"Error: {e}"Now we tell the model this tool exists. The schema is the same across Claude, GPT, and Gemini—a name, a description, and a JSON schema for the inputs:
tools = [
{
"name": "calculate",
"description": "Evaluate a mathematical expression. Use this for any math.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The math expression to evaluate, e.g. '2 + 2' or '(10 * 5) / 2'"
}
},
"required": ["expression"]
}
}
]The model reads the description to decide when to call the tool. Vague descriptions cause Tool Amnesia—the agent forgets it has tools, or picks the wrong one. Write descriptions like you're onboarding a new teammate.
Now the agent loop needs to handle tool calls:
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
tools=tools
)
# Check if the model wants to use a tool
if response.stop_reason == "tool_use":
# Find the tool call in the response
tool_use = next(
block for block in response.content
if block.type == "tool_use"
)
# Execute the tool
if tool_use.name == "calculate":
result = calculate(tool_use.input["expression"])
else:
result = f"Unknown tool: {tool_use.name}"
# Add the assistant's response and tool result to messages
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": result
}]
})
# Continue the loop—let the model process the result
continue
# No tool call, return the text response
return response.content[0].textRun this with "What's 1847 * 23?" and trace what happens:
- Model receives question, decides to call
calculate - Our code runs
calculate("1847 * 23")→"42481" - We append the result to
messagesand call the model again - Model sees the result and generates: "1847 × 23 = 42,481"
The loop ran twice. Reason → act → observe → respond. That's the core pattern.
Part 3: Memory Is Just a List
Notice what we're already doing: appending to messages. That list is the agent's memory. Every turn—user, assistant, tool result—accumulates in the same array:
messages = [
{"role": "user", "content": "What's 10 + 5?"},
{"role": "assistant", "content": [...]}, # tool_use block
{"role": "user", "content": [...]}, # tool_result
{"role": "assistant", "content": "10 + 5 = 15"},
{"role": "user", "content": "Now multiply that by 3"},
# The model can reference "that" because it sees the history
]This naive approach—just keep appending—works for short conversations. The model sees everything and can reference prior turns.
Context windows have limits. Claude caps at 200K tokens, GPT-4o at 128K. An agent running for an hour can accumulate 500+ tool calls. At some point the API rejects the request—or worse, the model silently loses information from the middle of context.
For our toy agent, we keep the naive approach. Production agents need smarter memory: sliding windows, summarization, or external retrieval. See Agent Memory Architecture for the patterns.
Part 4: Explicit Termination
Our agent loops until the model stops calling tools. But what if the model gets stuck? What if it keeps calling tools forever?
This is Infinite Loop Syndrome—one of five ways agents die in production. The fix: explicit termination via a finish tool:
tools = [
{
"name": "calculate",
"description": "Evaluate a mathematical expression.",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression"}
},
"required": ["expression"]
}
},
{
"name": "finish",
"description": "Call this when the task is complete and you have the final answer.",
"input_schema": {
"type": "object",
"properties": {
"answer": {"type": "string", "description": "The final answer to return"}
},
"required": ["answer"]
}
}
]Now update the loop:
def run_agent(user_message: str, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for i in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
tools=tools
)
if response.stop_reason == "tool_use":
tool_use = next(
block for block in response.content
if block.type == "tool_use"
)
# Check for explicit termination
if tool_use.name == "finish":
return tool_use.input["answer"]
# Execute other tools
if tool_use.name == "calculate":
result = calculate(tool_use.input["expression"])
else:
result = f"Unknown tool: {tool_use.name}"
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": result
}]
})
continue
# No tool call—return text response
return response.content[0].text
return "Max iterations reached without completion"Two safety mechanisms:
max_iterationscaps the loop (hard limit)finishtool provides explicit exit (clean termination)
Belt and suspenders. The agent knows when to stop.
Part 5: The Complete Agent
Here's the full implementation—~80 lines, zero frameworks:
import anthropic
from typing import Any
client = anthropic.Anthropic()
# --- Tools ---
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
# WARNING: eval is dangerous. Use a proper math parser in production.
result = eval(expression)
return str(result)
except Exception as e:
return f"Error: {e}"
tools = [
{
"name": "calculate",
"description": "Evaluate a mathematical expression. Use for any math.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression, e.g. '2 + 2' or '(10 * 5) / 2'"
}
},
"required": ["expression"]
}
},
{
"name": "finish",
"description": "Call when the task is complete with the final answer.",
"input_schema": {
"type": "object",
"properties": {
"answer": {"type": "string", "description": "Final answer"}
},
"required": ["answer"]
}
}
]
def execute_tool(name: str, inputs: dict[str, Any]) -> str:
"""Dispatch tool calls to their implementations."""
if name == "calculate":
return calculate(inputs["expression"])
return f"Unknown tool: {name}"
# --- Agent Loop ---
def run_agent(user_message: str, max_iterations: int = 10) -> str:
"""Run the agent loop until completion or max iterations."""
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
tools=tools
)
# Handle tool use
if response.stop_reason == "tool_use":
tool_use = next(
b for b in response.content if b.type == "tool_use"
)
# Explicit termination
if tool_use.name == "finish":
return tool_use.input["answer"]
# Execute and continue
result = execute_tool(tool_use.name, tool_use.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": result
}]
})
continue
# No tool call—return text
return response.content[0].text
return "Max iterations reached"
# --- Run it ---
if __name__ == "__main__":
result = run_agent("What is (1847 * 23) + (512 / 4)?")
print(result)Run it:
$ python agent.py
42609.0The agent:
- Received the math question
- Called
calculatefor the first expression - Called
calculatefor the second expression - Called
finishwith the combined result
Three iterations, explicit termination, correct answer. That's an agent.
Part 6: Where the Toy Breaks Down
This agent teaches the core loop. Ship it to production and you'll learn why frameworks exist the hard way.
What's missing:
-
Error handling. What if the API call fails? What if the tool throws? Our agent crashes. Production agents need try/catch, retries, and graceful degradation.
-
State persistence. If the process dies, the agent's memory dies with it. The Durable Agent covers how to persist state to disk so agents can survive crashes and restarts.
-
Structured orchestration. Our while loop is linear. Real workflows have branches, parallel paths, and cycles. The Graph Mandate explains why chat-based loops fail and state machines win.
-
Observability. When something goes wrong, how do you debug it? You need to log every decision, every tool call, every response. The Probabilistic Stack covers engineering for non-deterministic systems.
-
Confidence thresholds. Should every tool call execute automatically? Or should some pause for human approval? The HITL Firewall covers when to insert humans in the loop.
When you need these capabilities, frameworks like LangGraph, Temporal, and Inngest exist. They handle the infrastructure so you can focus on the logic. But now you know what they're abstracting: a loop, a message list, tool dispatch, and termination conditions.
What We Built
An agent is four things:
- A loop that calls an LLM repeatedly
- Tools that let the LLM take actions
- Memory (just a list of messages)
- Termination (explicit "finish" or iteration cap)
That's the whole abstraction. Everything else—orchestration frameworks, memory systems, observability platforms—builds on these primitives.
Now run the code. Modify it. Add a tool that fetches weather data or queries a database. Break it and see what happens. The best way to understand agents is to build one yourself, watch it fail, and fix it.
When you're ready for production, frameworks like LangGraph and Temporal are waiting. But you'll approach them differently now—not as magic boxes, but as sophisticated implementations of the loop you just wrote.
The Agent Thesis: What We Know After 100 Deployments
A synthesis of the patterns that separate agents that ship from agents that die in pilot purgatory. The throughlines across architecture, operations, economics, and security.
The 5 Agent Failure Modes (And How to Prevent Them)
Most AI agents fail silently in production. Here are the five failure modes killing your deployments—and the architecture patterns that prevent them.
The Durable Agent: Why Infrastructure Beats Prompts
A 15-minute task that crashes at 99% wastes $4.50 in compute. Temporal eliminates the Restart Tax and turns debugging into DVR replay.
The Graph Mandate: Why Chat-Based Agents Fail in Production
The "Chat Loop" is the "goto" statement of the AI era. 70-90% of enterprise AI projects stall in Pilot Purgatory. Graph architectures are the path to production.