Module 15: More on Building Agents

MGMT 675: Generative AI for Finance

Kerry Back, Rice University

Live Demo: Portfolio Analysis Agent

web-production-9f0f9.up.railway.app

Let’s try some questions:

  • “Review the portfolio — what’s the current sector allocation vs. target?”
  • “Find tax-loss harvesting opportunities in Technology”
  • “Harvest the INTC losses. What should I buy to replace the exposure?”

Watch how the agent chooses different tools depending on the question. It’s not running a script — it’s reasoning about what information it needs.

What Tools Would an Agent Need to Do This?

Five Tools, Three Types

%%{init: {'theme': 'default', 'themeVariables': {'fontSize': '24px'}}}%%
flowchart LR
    U[User] <-->|conversation| A[Claude Agent]
    A -->|get_holdings\naccount_id| M[Mock Data]
    A -->|get_target_allocation| M
    A -->|get_analyst_recommendations| M
    A -->|run_sql| DB[(MotherDuck\nSEP + tickers)]
    A -->|run_python| P[Python\nRuntime]

Data Lookup

  • get_holdings
  • get_target_allocation
  • get_analyst_recommendations
  • Return structured data (JSON)

Database Query

  • run_sql
  • Agent writes its own SQL
  • Executes against a live database
  • Returns query results

Computation

  • run_python
  • Agent writes Python code
  • Computes analytics on retrieved data
  • Persistent namespace across calls

Why an Agent, Not a Script?

A Script

  • Fixed sequence of steps
  • Same analysis every time
  • Can’t handle follow-up questions
  • New analysis = new code

An Agent

  • Chooses tools based on context
  • Different path for each question
  • Conversational — remembers context
  • New analysis = just ask

The agent uses the same five tools to answer hundreds of different questions. You write the tools once; the agent figures out how to combine them.

Providing Tools to an Agent

Tool Type 1: Data Lookup

The simplest tools return data from a known source. The agent provides parameters; you return results.

{
    "name": "get_portfolio",
    "description": (
        "Retrieve a client's portfolio with all data pre-computed. "
        "Returns total_portfolio_value, sector_summary with current "
        "and target weights, and holdings with prices, analyst "
        "ratings, market values, and unrealized gains/losses."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "account_id": {
                "type": "string",
                "description": "Client account identifier",
            }
        },
        "required": ["account_id"],
    },
}

Each tool has a name, description, and input_schema. The description tells the agent when and why to use the tool.

Implementing a Data Lookup Tool

Behind the scenes, the tool enriches raw data and returns pre-computed results:

def get_portfolio(account_id: str) -> str:
    enriched = []
    for h in HOLDINGS:
        price = STOCK_DATA[h["ticker"]]["price"]
        total_shares = sum(lot["shares"] for lot in h["lots"])
        total_cost = sum(lot["shares"] * lot["cost_basis"]
                         for lot in h["lots"])
        market_value = total_shares * price
        enriched.append({
            "ticker": h["ticker"],
            "sector": h["sector"],
            "analyst_rating": ANALYST_RATINGS[h["ticker"]],
            "current_price": price,
            "market_value": market_value,
            "unrealized_gl": market_value - total_cost,
            "lots": h["lots"],
        })
    # Pre-compute sector weights so the agent never does math
    total_mv = sum(p["market_value"] for p in enriched)
    sector_summary = ...  # current vs. target weights, diffs
    return json.dumps({
        "total_portfolio_value": total_mv,
        "sector_summary": sector_summary,
        "holdings": enriched,
    })

The tool does all the arithmetic — weights, gains/losses, differences. The agent just reads and presents values. In production, this would query a real portfolio management system.

Tool Type 2: Database Query

The most powerful pattern: let the agent write its own queries.

{
    "name": "run_sql",
    "description": (
        "Execute a SQL query against the MotherDuck database.\n"
        "Available tables:\n"
        "  - sep: daily stock prices (ticker, date, close, closeadj, ...)\n"
        "  - tickers: stock metadata (ticker, name, sector, industry, ...)\n\n"
        "Use closeadj for return calculations (adjusted for splits/dividends)."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "sql": {
                "type": "string",
                "description": "The SQL query to execute",
            }
        },
        "required": ["sql"],
    },
}

The description tells the agent what tables and columns exist. The agent composes SQL on the fly — correlations, drawdowns, returns — whatever the analysis requires.

Implementing a Database Query Tool

import duckdb
from dotenv import load_dotenv
import os

load_dotenv("path/to/.env")

def run_sql_query(sql: str) -> str:
    """Execute SQL against MotherDuck, return JSON."""
    token = os.getenv("MOTHERDUCK_TOKEN")
    conn = duckdb.connect(f"md:ndl?motherduck_token={token}")
    try:
        df = conn.execute(sql).fetchdf()
        return df.to_json(orient="records", date_format="iso")
    except Exception as e:
        return json.dumps({"error": str(e)})
    finally:
        conn.close()

The agent writes SQL; your function executes it and returns a JSON string. The agent never has direct database access — you control what runs and how.

Tool Type 3: Computation

Let the agent write and execute code on data it has already retrieved:

{
    "name": "run_python",
    "description": (
        "Execute Python code and return printed output. "
        "pandas, numpy, and json are available. "
        "The namespace persists across calls."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "Python code to execute.",
            }
        },
        "required": ["code"],
    },
}

This enables multi-tool chains: run_sql fetches prices → run_python computes correlations. The agent decides what to compute based on the question.

Implementing a Computation Tool

import io, sys

_python_namespace = {}  # persists across calls

def run_python_code(code: str) -> str:
    """Execute Python code, return printed output."""
    import numpy as np, pandas as pd
    _python_namespace.update({"pd": pd, "np": np, "json": json})

    old_stdout = sys.stdout
    sys.stdout = buf = io.StringIO()
    try:
        exec(code, _python_namespace)
        output = buf.getvalue()
        return output if output else "(no output)"
    except Exception as e:
        return f"Error: {e}"
    finally:
        sys.stdout = old_stdout

The shared _python_namespace lets the agent store a DataFrame in one call and use it in the next — e.g., fetch prices, then compute a correlation matrix.

Contextual Tools

Some tools are only useful after other tools have been called:

{
    "name": "get_analyst_recommendations",
    "description": (
        "Get stocks rated 'Strong Buy' for a given sector. "
        "Use this to find replacement candidates when tax-loss "
        "harvesting creates a need to replenish sector exposure."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "sector": {
                "type": "string",
                "description": "The sector (e.g., 'Technology')",
            }
        },
        "required": ["sector"],
    },
}

The agent only calls this after identifying a harvest opportunity — it doesn’t blindly fetch recommendations for every sector. The description guides the agent on when to use the tool.

The Tool Router

All tools are connected through a single dispatch function:

def execute_tool(name: str, inputs: dict) -> str:
    """Route a tool call to the appropriate handler."""
    if name == "get_holdings":
        return json.dumps(HOLDINGS)
    elif name == "get_target_allocation":
        return json.dumps(TARGET_ALLOCATION)
    elif name == "get_analyst_recommendations":
        sector = inputs["sector"]
        recs = STRONG_BUY.get(sector, [])
        return json.dumps({"sector": sector, "strong_buy": recs})
    elif name == "run_sql":
        return run_sql_query(inputs["sql"])
    elif name == "run_python":
        return run_python_code(inputs["code"])

This is your code — you control what each tool does. The agent only sees the tool’s name, description, and the result you return.

The Agent Loop

The Core Pattern

import anthropic

client = anthropic.Anthropic()
messages = []

# User says something
messages.append({"role": "user", "content": user_input})

# Agent loop: keep going until it stops calling tools
while True:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        system=SYSTEM_PROMPT,
        tools=tools,
        messages=messages,
        max_tokens=4096,
    )

    if response.stop_reason == "tool_use":
        # Agent wants to call tools — execute and feed results back
        ...
    else:
        # Agent is done — display its response
        break

The key insight: the loop keeps running until the agent has all the information it needs. It might call 1 tool or 5.

Handling Tool Calls

if response.stop_reason == "tool_use":
    # Save the assistant's message (contains tool_use blocks)
    messages.append({"role": "assistant", "content": response.content})

    # Execute each tool and collect results
    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result,
            })

    # Send results back to the agent
    messages.append({"role": "user", "content": tool_results})

The tool_use_id links each result to the corresponding call. The agent can make multiple tool calls in one turn.

The System Prompt

SYSTEM_PROMPT = """\
You are a portfolio analyst assistant. You help review a client
portfolio, identify tax-loss harvesting opportunities, and
recommend replacement securities.

When harvesting losses, sell the lots with the largest unrealized
loss first (highest cost basis relative to current price) using
specific lot identification for maximum tax benefit.

When recommending a replacement, compute return correlations
between the stock being sold and each Strong Buy candidate,
then recommend the most highly correlated candidate.

When writing SQL, use closeadj for return calculations, alias
tables (e.g., SELECT s.close FROM sep s), and cast date
(s.date::DATE) for comparisons.
"""

The system prompt gives the agent its strategy — domain expertise that guides tool selection. The tools define what the agent can do; the system prompt defines what it should do.

How the Pieces Fit Together

%%{init: {'theme': 'default', 'themeVariables': {'fontSize': '24px'}}}%%
flowchart TB
    SP["System Prompt\n(strategy & expertise)"] --> Agent
    TD["Tool Definitions\n(JSON Schema)"] --> Agent
    Agent -->|"tool_use"| Router["Tool Router\n(your code)"]
    Router -->|"tool_result"| Agent
    Agent -->|"stop_reason ≠ tool_use"| Response["Final Response\nto User"]

You provide three things: a system prompt (domain expertise), tool definitions (capabilities), and tool implementations (what actually runs). The agent loop connects them.

The Claude Agent SDK

From Hand-Rolled to SDK

We built the agent loop manually: parse tool_use blocks, route to handlers, feed results back. The Claude Agent SDK (claude-agent-sdk) handles all of this for you.

pip install claude-agent-sdk

Raw Messages API (what we built)

  • You write the while True loop
  • You parse tool calls and route them
  • You construct tool_result messages
  • You manage the message list

Agent SDK

  • SDK runs the loop via async for
  • Tools defined with @tool decorator
  • SDK executes tools automatically
  • SDK manages sessions and context

Defining Tools with the SDK

The @tool decorator replaces raw JSON Schema:

from claude_agent_sdk import tool
from typing import Any

@tool(
    name="get_holdings",
    description="Retrieve all portfolio holdings with tax lot detail.",
    input_schema={
        "type": "object",
        "properties": {
            "account_id": {"type": "string",
                           "description": "Client account identifier"}
        },
        "required": ["account_id"],
    }
)
async def get_holdings(args: dict[str, Any]) -> dict[str, Any]:
    account_id = args["account_id"]
    return {"content": [{"type": "text", "text": json.dumps(HOLDINGS)}]}

@tool(
    name="run_sql",
    description="Execute a SQL query against the MotherDuck database.",
    input_schema={
        "type": "object",
        "properties": {"sql": {"type": "string"}},
        "required": ["sql"],
    }
)
async def run_sql(args: dict[str, Any]) -> dict[str, Any]:
    result = run_sql_query(args["sql"])
    return {"content": [{"type": "text", "text": result}]}

Same tools, same logic — but the decorator registers the tool with the SDK. No manual routing needed.

Running the Agent

from claude_agent_sdk import (
    query, ClaudeAgentOptions, create_sdk_mcp_server,
    AssistantMessage, ResultMessage
)

# Package tools into an MCP server
server = create_sdk_mcp_server(
    name="portfolio", version="1.0.0",
    tools=[get_holdings, get_target_allocation,
           get_analyst_recommendations, run_sql, run_python]
)

options = ClaudeAgentOptions(
    mcp_servers={"portfolio": server},
    system_prompt=SYSTEM_PROMPT,
    model="claude-sonnet-4-6",
    permission_mode="bypassPermissions",
    max_turns=20,
)

create_sdk_mcp_server wraps your @tool functions into an in-process MCP server. The SDK uses the same Model Context Protocol that Claude Desktop and Claude Code use.

The Loop Disappears

import asyncio

async def main():
    async for message in query(
        prompt="Review the portfolio and find tax-loss candidates",
        options=options,
    ):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if hasattr(block, "name"):
                    print(f"  [calling {block.name}...]")

        elif isinstance(message, ResultMessage):
            print(message.result)
            print(f"  Cost: ${message.total_cost_usd:.4f}")

asyncio.run(main())

No while True. No tool_use_id. No message list management. The SDK handles the entire agentic loop — you just iterate over the messages it yields.

Hand-Rolled vs. SDK

Hand-Rolled Agent SDK
Agent loop ~25 lines of while True async for message in query(...)
Tool routing if/elif dispatch function Automatic via @tool decorator
Tool definitions Raw JSON Schema dicts @tool decorator
Error handling Manual Built-in retries, context compaction
Multi-turn Manually append to messages list SDK manages session state
Learning value See every moving part Production-ready abstraction

Start with the hand-rolled version to understand the mechanics. Use the SDK when you want to ship something.

Deploying the Agent

What Is Docker?

A Docker container packages your application, its dependencies, and its configuration into a single, portable unit.

Without Docker

  • “It works on my machine”
  • Install Python, pip, duckdb, anthropic…
  • Manage conflicting library versions
  • Secrets scattered across config files

With Docker

  • Identical environment everywhere
  • One command to build, one to run
  • Dependencies frozen in the image
  • Secrets injected at runtime

A container is like a lightweight virtual machine. It runs the same way on a developer’s laptop, a test server, or a cloud cluster.

Containerizing the Portfolio Agent

FROM python:3.12-slim

WORKDIR /app
COPY portfolio_agent.py .

RUN pip install anthropic duckdb python-dotenv pandas numpy

# Secrets are passed as environment variables at runtime,
# never baked into the image
# ANTHROPIC_API_KEY and MOTHERDUCK_TOKEN set at deploy time

CMD ["python", "portfolio_agent.py"]

The Dockerfile is a recipe. docker build creates the image; docker run starts the container. In production, companies deploy containers via orchestrators like Kubernetes or AWS ECS, which handle scaling, restarts, and secret management. The container also provides an isolation boundary for tools like run_python — code executes inside the container, not on the host.

Railway: Docker Without the Hassle

Railway handles the Dockerization for you. Push your code and Railway builds the container, deploys it, and manages secrets — no Dockerfile required.

What You Do

  • Connect your GitHub repo
  • Set environment variables (API keys)
  • Push code

What Railway Does

  • Detects Python, builds the container
  • Injects secrets at runtime
  • Deploys with HTTPS endpoint
  • Auto-redeploys on each push

Railway is one of several platforms (Render, Fly.io, Koyeb) that abstract away Docker and Kubernetes. You get the benefits of containerization without writing Dockerfiles or managing infrastructure.

Key Takeaways

  1. Start with the demo — see what an agent can do before building one
  2. Identify the tools — what capabilities does the agent need?
  3. Three tool types — data lookup, database query, computation
  4. Descriptions matter — the agent decides which tools to use based on descriptions
  5. The agentic loop — keep feeding tool results until the agent is done
  6. Deploy with Docker — containerize for production; Railway makes it easy

The hand-rolled version is ~200 lines of logic. The SDK version is shorter, but the concepts are identical: tools, system prompt, and an agentic loop.

Example: Presentation Examiner

What We Want to Build

An AI-powered presentation exam: students upload slides, present them by voice, answer revised follow-up questions, and receive automated grades.

Student Experience

  • Upload a PDF slide deck
  • Present slides to an AI examiner (voice)
  • Answer 3 revised exam questions
  • Receive scores and written feedback

Instructor Experience

  • Create assignments with prompts and rubrics
  • Upload reference materials and solution keys
  • Enroll students via single-use magic links (no passwords)
  • Review grades and export results

The AI replaces the live examiner. It listens to the presentation, revises its questions based on what was said, and grades three dimensions: slides, presentation, and Q&A. Scales to hundreds of students with immediate, consistent feedback.

How the Exam Flows

A FastAPI web application orchestrates a five-phase pipeline:

Phase What Happens API Used
1. Analysis Extract slide text, generate 3 questions Claude Sonnet
2. Presentation Student presents; AI listens silently ElevenLabs
3. Revision Revise questions based on what student said Claude Sonnet
4. Q&A AI asks revised questions; student answers ElevenLabs
5. Grading Score slides + presentation + Q&A, generate feedback GPT-4.1

The voice phases (2 and 4) connect back to the app via webhooks — when ElevenLabs signals a session ended, the app fetches the transcript, advances the phase, and kicks off the next step as a background task. This is not an agentic loop — it’s an orchestrated pipeline where each phase always triggers the next. Match the architecture to the problem: use an agent loop when the path depends on the question; use a pipeline when the steps are known in advance.

Refining with Automated Testing

How do you know the grading is fair and consistent? Build a synthetic agent that generates slide decks, presentations, and Q&A transcripts at controlled quality levels, runs them through the full pipeline, and checks that grades track intended quality.

Illustrative results (10 scenarios × 4 quality levels):

Average scores by intended quality level:
  excellent : 3.75/4.0  (n=10)
  good      : 3.20/4.0  (n=10)
  mediocre  : 2.45/4.0  (n=10)
  poor      : 1.80/4.0  (n=10)

What It Generates

  • 10 finance scenarios (DCF, LBO, M&A, etc.)
  • Slide decks at 4 quality levels
  • Presentation transcripts (confident → confused)
  • Q&A answers (insightful → evasive)

What It Validates

  • Do excellent submissions score higher than poor ones?
  • Are grades consistent across similar quality levels?
  • Does the full pipeline complete without errors?
  • Does the feedback match the grade?

Change a prompt or grading rubric, re-run the synthetic agent, and see whether results improved or regressed. This is test-driven refinement — the same idea as automated regression testing, applied to AI behavior.

Presentation Examiner

presentation.rice-business.org