“Review the portfolio — what’s the current sector allocation vs. target?”
“Find tax-loss harvesting opportunities in Technology”
“Harvest the INTC losses. What should I buy to replace the exposure?”
Watch how the agent chooses different tools depending on the question. It’s not running a script — it’s reasoning about what information it needs.
What Tools Would an Agent Need to Do This?
Five Tools, Three Types
%%{init: {'theme': 'default', 'themeVariables': {'fontSize': '24px'}}}%%
flowchart LR
U[User] <-->|conversation| A[Claude Agent]
A -->|get_holdings\naccount_id| M[Mock Data]
A -->|get_target_allocation| M
A -->|get_analyst_recommendations| M
A -->|run_sql| DB[(MotherDuck\nSEP + tickers)]
A -->|run_python| P[Python\nRuntime]
Data Lookup
get_holdings
get_target_allocation
get_analyst_recommendations
Return structured data (JSON)
Database Query
run_sql
Agent writes its own SQL
Executes against a live database
Returns query results
Computation
run_python
Agent writes Python code
Computes analytics on retrieved data
Persistent namespace across calls
Why an Agent, Not a Script?
A Script
Fixed sequence of steps
Same analysis every time
Can’t handle follow-up questions
New analysis = new code
An Agent
Chooses tools based on context
Different path for each question
Conversational — remembers context
New analysis = just ask
The agent uses the same five tools to answer hundreds of different questions. You write the tools once; the agent figures out how to combine them.
Providing Tools to an Agent
Tool Type 1: Data Lookup
The simplest tools return data from a known source. The agent provides parameters; you return results.
{"name": "get_portfolio","description": ("Retrieve a client's portfolio with all data pre-computed. ""Returns total_portfolio_value, sector_summary with current ""and target weights, and holdings with prices, analyst ""ratings, market values, and unrealized gains/losses." ),"input_schema": {"type": "object","properties": {"account_id": {"type": "string","description": "Client account identifier", } },"required": ["account_id"], },}
Each tool has a name, description, and input_schema. The description tells the agent when and why to use the tool.
Implementing a Data Lookup Tool
Behind the scenes, the tool enriches raw data and returns pre-computed results:
def get_portfolio(account_id: str) ->str: enriched = []for h in HOLDINGS: price = STOCK_DATA[h["ticker"]]["price"] total_shares =sum(lot["shares"] for lot in h["lots"]) total_cost =sum(lot["shares"] * lot["cost_basis"]for lot in h["lots"]) market_value = total_shares * price enriched.append({"ticker": h["ticker"],"sector": h["sector"],"analyst_rating": ANALYST_RATINGS[h["ticker"]],"current_price": price,"market_value": market_value,"unrealized_gl": market_value - total_cost,"lots": h["lots"], })# Pre-compute sector weights so the agent never does math total_mv =sum(p["market_value"] for p in enriched) sector_summary = ... # current vs. target weights, diffsreturn json.dumps({"total_portfolio_value": total_mv,"sector_summary": sector_summary,"holdings": enriched, })
The tool does all the arithmetic — weights, gains/losses, differences. The agent just reads and presents values. In production, this would query a real portfolio management system.
Tool Type 2: Database Query
The most powerful pattern: let the agent write its own queries.
{"name": "run_sql","description": ("Execute a SQL query against the MotherDuck database.\n""Available tables:\n"" - sep: daily stock prices (ticker, date, close, closeadj, ...)\n"" - tickers: stock metadata (ticker, name, sector, industry, ...)\n\n""Use closeadj for return calculations (adjusted for splits/dividends)." ),"input_schema": {"type": "object","properties": {"sql": {"type": "string","description": "The SQL query to execute", } },"required": ["sql"], },}
The description tells the agent what tables and columns exist. The agent composes SQL on the fly — correlations, drawdowns, returns — whatever the analysis requires.
The shared _python_namespace lets the agent store a DataFrame in one call and use it in the next — e.g., fetch prices, then compute a correlation matrix.
Contextual Tools
Some tools are only useful after other tools have been called:
{"name": "get_analyst_recommendations","description": ("Get stocks rated 'Strong Buy' for a given sector. ""Use this to find replacement candidates when tax-loss ""harvesting creates a need to replenish sector exposure." ),"input_schema": {"type": "object","properties": {"sector": {"type": "string","description": "The sector (e.g., 'Technology')", } },"required": ["sector"], },}
The agent only calls this after identifying a harvest opportunity — it doesn’t blindly fetch recommendations for every sector. The description guides the agent on when to use the tool.
The Tool Router
All tools are connected through a single dispatch function:
def execute_tool(name: str, inputs: dict) ->str:"""Route a tool call to the appropriate handler."""if name =="get_holdings":return json.dumps(HOLDINGS)elif name =="get_target_allocation":return json.dumps(TARGET_ALLOCATION)elif name =="get_analyst_recommendations": sector = inputs["sector"] recs = STRONG_BUY.get(sector, [])return json.dumps({"sector": sector, "strong_buy": recs})elif name =="run_sql":return run_sql_query(inputs["sql"])elif name =="run_python":return run_python_code(inputs["code"])
This is your code — you control what each tool does. The agent only sees the tool’s name, description, and the result you return.
The Agent Loop
The Core Pattern
import anthropicclient = anthropic.Anthropic()messages = []# User says somethingmessages.append({"role": "user", "content": user_input})# Agent loop: keep going until it stops calling toolswhileTrue: response = client.messages.create( model="claude-sonnet-4-6", system=SYSTEM_PROMPT, tools=tools, messages=messages, max_tokens=4096, )if response.stop_reason =="tool_use":# Agent wants to call tools — execute and feed results back ...else:# Agent is done — display its responsebreak
The key insight: the loop keeps running until the agent has all the information it needs. It might call 1 tool or 5.
Handling Tool Calls
if response.stop_reason =="tool_use":# Save the assistant's message (contains tool_use blocks) messages.append({"role": "assistant", "content": response.content})# Execute each tool and collect results tool_results = []for block in response.content:if block.type=="tool_use": result = execute_tool(block.name, block.input) tool_results.append({"type": "tool_result","tool_use_id": block.id,"content": result, })# Send results back to the agent messages.append({"role": "user", "content": tool_results})
The tool_use_id links each result to the corresponding call. The agent can make multiple tool calls in one turn.
The System Prompt
SYSTEM_PROMPT ="""\You are a portfolio analyst assistant. You help review a clientportfolio, identify tax-loss harvesting opportunities, andrecommend replacement securities.When harvesting losses, sell the lots with the largest unrealizedloss first (highest cost basis relative to current price) usingspecific lot identification for maximum tax benefit.When recommending a replacement, compute return correlationsbetween the stock being sold and each Strong Buy candidate,then recommend the most highly correlated candidate.When writing SQL, use closeadj for return calculations, aliastables (e.g., SELECT s.close FROM sep s), and cast date(s.date::DATE) for comparisons."""
The system prompt gives the agent its strategy — domain expertise that guides tool selection. The tools define what the agent can do; the system prompt defines what it should do.
You provide three things: a system prompt (domain expertise), tool definitions (capabilities), and tool implementations (what actually runs). The agent loop connects them.
The Claude Agent SDK
From Hand-Rolled to SDK
We built the agent loop manually: parse tool_use blocks, route to handlers, feed results back. The Claude Agent SDK (claude-agent-sdk) handles all of this for you.
pip install claude-agent-sdk
Raw Messages API (what we built)
You write the while True loop
You parse tool calls and route them
You construct tool_result messages
You manage the message list
Agent SDK
SDK runs the loop via async for
Tools defined with @tool decorator
SDK executes tools automatically
SDK manages sessions and context
Defining Tools with the SDK
The @tool decorator replaces raw JSON Schema:
from claude_agent_sdk import toolfrom typing import Any@tool( name="get_holdings", description="Retrieve all portfolio holdings with tax lot detail.", input_schema={"type": "object","properties": {"account_id": {"type": "string","description": "Client account identifier"} },"required": ["account_id"], })asyncdef get_holdings(args: dict[str, Any]) ->dict[str, Any]: account_id = args["account_id"]return {"content": [{"type": "text", "text": json.dumps(HOLDINGS)}]}@tool( name="run_sql", description="Execute a SQL query against the MotherDuck database.", input_schema={"type": "object","properties": {"sql": {"type": "string"}},"required": ["sql"], })asyncdef run_sql(args: dict[str, Any]) ->dict[str, Any]: result = run_sql_query(args["sql"])return {"content": [{"type": "text", "text": result}]}
Same tools, same logic — but the decorator registers the tool with the SDK. No manual routing needed.
create_sdk_mcp_server wraps your @tool functions into an in-process MCP server. The SDK uses the same Model Context Protocol that Claude Desktop and Claude Code use.
The Loop Disappears
import asyncioasyncdef main():asyncfor message in query( prompt="Review the portfolio and find tax-loss candidates", options=options, ):ifisinstance(message, AssistantMessage):for block in message.content:ifhasattr(block, "name"):print(f" [calling {block.name}...]")elifisinstance(message, ResultMessage):print(message.result)print(f" Cost: ${message.total_cost_usd:.4f}")asyncio.run(main())
No while True. No tool_use_id. No message list management. The SDK handles the entire agentic loop — you just iterate over the messages it yields.
Hand-Rolled vs. SDK
Hand-Rolled
Agent SDK
Agent loop
~25 lines of while True
async for message in query(...)
Tool routing
if/elif dispatch function
Automatic via @tool decorator
Tool definitions
Raw JSON Schema dicts
@tool decorator
Error handling
Manual
Built-in retries, context compaction
Multi-turn
Manually append to messages list
SDK manages session state
Learning value
See every moving part
Production-ready abstraction
Start with the hand-rolled version to understand the mechanics. Use the SDK when you want to ship something.
Deploying the Agent
What Is Docker?
A Docker container packages your application, its dependencies, and its configuration into a single, portable unit.
Without Docker
“It works on my machine”
Install Python, pip, duckdb, anthropic…
Manage conflicting library versions
Secrets scattered across config files
With Docker
Identical environment everywhere
One command to build, one to run
Dependencies frozen in the image
Secrets injected at runtime
A container is like a lightweight virtual machine. It runs the same way on a developer’s laptop, a test server, or a cloud cluster.
Containerizing the Portfolio Agent
FROM python:3.12-slimWORKDIR /appCOPY portfolio_agent.py .RUNpip install anthropic duckdb python-dotenv pandas numpy# Secrets are passed as environment variables at runtime,# never baked into the image# ANTHROPIC_API_KEY and MOTHERDUCK_TOKEN set at deploy timeCMD ["python", "portfolio_agent.py"]
The Dockerfile is a recipe. docker build creates the image; docker run starts the container. In production, companies deploy containers via orchestrators like Kubernetes or AWS ECS, which handle scaling, restarts, and secret management. The container also provides an isolation boundary for tools like run_python — code executes inside the container, not on the host.
Railway: Docker Without the Hassle
Railway handles the Dockerization for you. Push your code and Railway builds the container, deploys it, and manages secrets — no Dockerfile required.
What You Do
Connect your GitHub repo
Set environment variables (API keys)
Push code
What Railway Does
Detects Python, builds the container
Injects secrets at runtime
Deploys with HTTPS endpoint
Auto-redeploys on each push
Railway is one of several platforms (Render, Fly.io, Koyeb) that abstract away Docker and Kubernetes. You get the benefits of containerization without writing Dockerfiles or managing infrastructure.
Key Takeaways
Start with the demo — see what an agent can do before building one
Identify the tools — what capabilities does the agent need?
Three tool types — data lookup, database query, computation
Descriptions matter — the agent decides which tools to use based on descriptions
The agentic loop — keep feeding tool results until the agent is done
Deploy with Docker — containerize for production; Railway makes it easy
The hand-rolled version is ~200 lines of logic. The SDK version is shorter, but the concepts are identical: tools, system prompt, and an agentic loop.
Example: Presentation Examiner
What We Want to Build
An AI-powered presentation exam: students upload slides, present them by voice, answer revised follow-up questions, and receive automated grades.
Student Experience
Upload a PDF slide deck
Present slides to an AI examiner (voice)
Answer 3 revised exam questions
Receive scores and written feedback
Instructor Experience
Create assignments with prompts and rubrics
Upload reference materials and solution keys
Enroll students via single-use magic links (no passwords)
Review grades and export results
The AI replaces the live examiner. It listens to the presentation, revises its questions based on what was said, and grades three dimensions: slides, presentation, and Q&A. Scales to hundreds of students with immediate, consistent feedback.
How the Exam Flows
A FastAPI web application orchestrates a five-phase pipeline:
The voice phases (2 and 4) connect back to the app via webhooks — when ElevenLabs signals a session ended, the app fetches the transcript, advances the phase, and kicks off the next step as a background task. This is not an agentic loop — it’s an orchestrated pipeline where each phase always triggers the next. Match the architecture to the problem: use an agent loop when the path depends on the question; use a pipeline when the steps are known in advance.
Refining with Automated Testing
How do you know the grading is fair and consistent? Build a synthetic agent that generates slide decks, presentations, and Q&A transcripts at controlled quality levels, runs them through the full pipeline, and checks that grades track intended quality.
Do excellent submissions score higher than poor ones?
Are grades consistent across similar quality levels?
Does the full pipeline complete without errors?
Does the feedback match the grade?
Change a prompt or grading rubric, re-run the synthetic agent, and see whether results improved or regressed. This is test-driven refinement — the same idea as automated regression testing, applied to AI behavior.