An API lets your code communicate with an LLM service over the internet.
What You Need
An API key (authentication)
The anthropic Python package
A model name (e.g., claude-sonnet-4-20250514)
Getting Your Key
Go to console.anthropic.com
Create an API key under Settings
Add credit ($5 is plenty for the course)
Your Claude Pro subscription covers Claude.ai and Claude Code. The API is a separate product with pay-per-use pricing.
The Model Landscape: OpenRouter
OpenRouter is a marketplace that routes API calls to hundreds of models from dozens of providers — a single place to see what exists and compare pricing.
One API, many models — same code, swap the model name to use GPT-4o, Gemini, Llama, Mistral, Grok, and more
Pricing transparency — cost per million input/output tokens shown for every model
Good for exploration — browse capabilities and benchmarks before committing to a provider
response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="You are a finance tutor...", messages=[ {"role": "user", "content": "What is a P/E ratio?"}, {"role": "assistant", "content": "A P/E ratio is..."}, {"role": "user", "content": "How do I interpret it?"} ])
Each API call is independent — LLM has no memory
You must send the entire conversation history each time
The system prompt defines the agent’s behavior and available tools
The agent is part LLM intelligence, part traditional programming.
Claude Code as an Agent Harness
What a Harness Provides
The pseudocode shows the logic. A harness is the infrastructure that actually runs that loop — and controls what the agent is allowed to do.
Runtime Infrastructure
Maintains conversation state across turns
Executes tool calls and captures results
Handles errors so the agent can recover
Enforces iteration and cost limits
Control and Safety
Permission system: which tools are allowed
Human-in-the-loop approval gates
Hooks: custom checks before/after actions
Audit log of every action taken
Claude Code is a complete harness — you supply the task; it handles the loop, tools, and guardrails.
The Real Agent Loop
A Production-Grade Loop
iteration =0while iteration < max_turns: response = call_llm(messages, system_prompt, tools) messages.append(response) # Maintain full historyif response.stop_reason =="end_turn":return response.text # Task completefor tool_call in response.tool_uses:if needs_approval(tool_call): # Human-in-the-loopifnot ask_user(tool_call):break# User blocked the actiontry: result = execute(tool_call)exceptExceptionas e: result =f"Error: {e}"# Agent sees error and recovers messages.append(tool_result(result)) iteration +=1raise MaxIterationsError("Did not complete")
Claude Code’s Four Harness Layers
CLAUDE.md
Injected as the system prompt. Defines the agent’s persona, project context, and behavioral rules before any user message is sent.
Permissions
Controls which tools and file paths the agent can access. Separate allow/deny lists for bash commands, file writes, and network calls.
Hooks
Shell commands that fire automatically at agent lifecycle events: before a tool runs, after it completes, or when the agent stops.
Iteration Limits
--max-turns caps how many steps the agent takes. Prevents runaway loops and controls API spend.
Guardrails and Constraints
Why Guardrails?
Without constraints, an autonomous agent can cause real damage — even with good intentions.
What Can Go Wrong
“Clean up the repo” → deletes files you need
A loop bug triggers hundreds of API calls
Write access to the database → unintended edits
A misread instruction → wrong files overwritten
The Principle
Grant the agent the minimum access it needs to complete the task — and no more.
Read-only unless writes are required
Specific directories, not the whole filesystem
No network unless the task requires it
Permission Modes in Claude Code
Claude Code pauses and asks before taking consequential actions — unless you explicitly allow them.
Default: Human in the Loop
Agent stops before running bash commands, writing files, or calling the network
You approve or deny each action
Safe for unfamiliar tasks or new codebases
Allow Lists
Approve a specific command once — or always
Approve an entire tool (e.g., all file reads)
Approve a path prefix (e.g., ./reports/)
Stored in settings.json
Human approval is the default. You choose how much autonomy to grant, task by task.
CLAUDE.md as a Behavioral Guardrail
CLAUDE.md is not just documentation — it is the agent’s system prompt. Use it to enforce rules.
Examples of Constraint Instructions
- Never modify files outside the src/ directory- Always ask before running git push or git commit- Use read-only database connections unless told otherwise- If a task would delete more than 3 files, pause and confirm
The system prompt is the first line of defense. Clear written rules reduce the chance of unintended actions.
Hooks: Programmatic Checks
Hooks run shell commands automatically at agent lifecycle events.
Event Types
PreToolUse — runs before a tool fires; can block the action
PostToolUse — runs after a tool completes; can log or validate
Stop — runs when the agent finishes
Example Uses
Block any bash command containing rm -rf
Log every file write to an audit trail
Send an alert when the agent finishes
Run tests automatically after every code edit
Hooks enforce rules programmatically — the check runs whether or not the agent remembers the instruction.
Layered Guardrails
No single guardrail is sufficient. Defense in depth combines all layers.
CLAUDE.md — behavioral rules in the system prompt
Permissions — tool and path access control
Hooks — programmatic checks at runtime
Sandboxing — isolated execution environment
Human approval — pause before high-stakes actions
Iteration limits — prevent runaway loops and cost overruns
Each layer catches what the previous one misses. Together they make autonomous agents safe to deploy.
The Dashboard Trap
Dashboards Answer Yesterday’s Questions
Organizations spend millions building dashboards. When a new question arises, the cycle restarts: requirements –> design –> build –> deploy.
The Typical Dashboard Lifecycle
Business user requests a report
Team builds queries, charts, and deploys
User asks a follow-up — back to step 1
The Cost
Time: Weeks to months per dashboard
Money: BI licenses, engineering hours, maintenance
Rigidity: Fixed views of fixed data
Gartner: only 20% of analytic insights deliver business outcomes.
The Fundamental Problem
Dashboards answer pre-defined questions. But the most valuable analysis comes from ad-hoc questions that arise in the moment.
“What happened to margins in the Southeast last quarter?”
“Show me our top 10 customers by growth rate, excluding one-time orders”
“Compare Q3 headcount vs. budget by department, and flag anyone over 110%”
These are simple questions. Getting answers shouldn’t require a development cycle.
Natural Language as the Query Interface
The Shift: From Dashboards to Conversations
Traditional Dashboard
Click filters and select dates to query
Answers take minutes to weeks
Follow-ups require a new dashboard request
Natural Language AI
Ask in plain English
Answers in seconds
Follow-ups are the next sentence
The dashboard was a workaround for the fact that databases don’t speak English. Now they do.
What This Looks Like in Practice
The Conversation
“Show me monthly revenue by product line for 2025”
AI: writes SQL, produces grouped bar chart
“Break out Enterprise by region”
AI: refines query, updates chart
What the User Needed to Know
What questions to ask
Whether the answers make sense
Nothing else
Behind the Scenes
4 different SQL queries written
3 visualizations produced
Derived metrics calculated
The Database Agent Pattern
The most powerful dashboard replacement: an AI agent connected to your database.
How to Build It
Connect database via MCP or file upload
Give AI the schema: table names, columns, relationships
Describe the business context
Start asking questions
What the Agent Can Do
Write and execute SQL queries
Compute derived metrics (growth rates, ratios)
Generate charts and export to Excel or PowerPoint
Replacing Dashboards: FP&A and Treasury
FP&A
“Budget vs. actual for Q3, decompose the variance into volume, price, and cost drivers”
“Rolling 12-month revenue forecast with confidence bands”
Treasury & Risk
“Cash position over the next 90 days using AR/AP forecasts”
“VaR by desk for the last 30 days, flag breaches”
Each of these would take a BI team days to build. With a database agent, they take seconds.
Replacing Dashboards: Portfolio and Executive
Portfolio Management
“Sector allocation vs. benchmark with active weights”
“Performance attribution — allocation vs. selection”
Executive Reporting
“One-page executive summary with KPIs and trends”
“Board deck from this quarter’s financials — 5 slides max”
The AI doesn’t replace the analyst’s judgment — it replaces the mechanical work of pulling data and building charts.
The Weekly Ops Review — Before and After
Before: The Dashboard Era
Data team pulls exports and builds slides (6+ hrs)
Manager reviews and revises (3 hrs)
VP asks a question — “We’ll get back to you next week”
Total: 9+ hours per week
After: Natural Language AI
AI agent generates ops review (3 min)
Manager reviews and iterates in chat
VP asks a question — AI answers in 10 seconds
Total: 15 minutes + live Q&A
The Reporting Pipeline
Five Steps from Query to Report
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '28px'}, 'flowchart': {'nodeSpacing': 100, 'rankSpacing': 140, 'padding': 28, 'useMaxWidth': true}}}%%
flowchart LR
A["<b>Database<br/>Query</b>"] --> B["<b>Transform<br/>& Compute</b>"]
B --> C["<b>Charts &<br/>Tables</b>"]
C --> D["<b>AI<br/>Narrative</b>"]
D --> E["<b>PowerPoint<br/>Output</b>"]
style A fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style B fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style C fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style D fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style E fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
From raw data to polished deck — no copy-paste, no formatting, no manual writing.
Building It with Your Coding Agent
You do not need to write this by hand. Give your coding agent a single prompt:
“Build an app that connects to a database. When the user selects a report and clicks Generate, run the SQL query, create a chart, write a three-sentence executive summary, and assemble a three-slide PowerPoint (title, chart, findings). Show the chart and summary on screen with a download button for the PowerPoint file.”
Your job is not to write the code. Your job is to test the result, refine the prompt, and iterate until the output meets your standards.
Building the Pipeline with Streamlit
Streamlit turns a Python script into a web app in minutes. Combined with an AI pipeline, it becomes a self-service reporting tool.
User Interface
Dropdown: select time period, department, or metric
Button: “Generate Report”
Download: auto-generated .pptx file
Behind the Scenes
Query database for selected parameters
Compute metrics and build charts
Send chart data to LLM for narrative
Assemble PowerPoint with python-pptx
The user clicks one button and gets a polished deck. No analyst needed.
Example: Monthly Ops Review Pipeline
What the Pipeline Produces
Title slide, KPI summary table, and trend charts
Variance waterfall (budget vs. actual)
AI-generated executive summary
Time Comparison
Manual: 6–8 hours per month (data pull, Excel, copy-paste into PowerPoint, write narrative)
Pipeline: 30 seconds per run
ROI: First month pays for the build time
Automated PowerPoint with python-pptx
python-pptx is a Python library for creating and editing PowerPoint files programmatically.
What It Can Do
Create slides from templates
Insert charts, tables, and images
Apply corporate formatting (fonts, colors, logos)
Populate placeholders with live data
The AI Advantage
AI writes the python-pptx code for you
Tell Claude: “Create a 5-slide deck from this data with a waterfall chart on slide 3”
AI handles layout, formatting, and data binding
You describe the deck; AI builds the automation. The pipeline runs unattended after that.
Sandboxed Execution
Development vs. Production
Development (Your Laptop)
Agent runs in your environment
Full access to your files
Fine for prototyping
Production
Agent runs in a container (Docker)
Isolated, disposable environment
Read-only database access
A container is a disposable, isolated computing environment. A bug in AI-generated code cannot affect your other files, your database, or your network.
The Sandbox Pattern
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '28px'}, 'flowchart': {'nodeSpacing': 80, 'rankSpacing': 120, 'padding': 24, 'useMaxWidth': true}}}%%
flowchart LR
U["<b>User</b>"] --> A["<b>Agent</b>"]
A --> S["<b>Sandboxed<br>Code</b>"]
S <--> DB["<b>Database<br>(read-only)</b>"]
S --> R["<b>Report</b>"]
R --> U
style U fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style A fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style S fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style DB fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
style R fill:#eff6ff,stroke:#3b82f6,stroke-width:2px,color:#0f172a,font-size:28px,padding:24px
Agent plans and orchestrates; code runs in a sandboxed container
Container queries the database through a read-only connection
Finished report flows back to the user
From Prototype to Production
The reporting app on your laptop is a prototype. Moving it to production adds infrastructure, not intelligence.
Authentication: SSO so only authorized users can generate reports
Logging: every query and LLM call recorded for compliance
Read-only credentials: the agent cannot modify data
The pipeline you prototyped is the same artifact that powers the production tool. IT wraps it in infrastructure; you provide the domain knowledge and the prompt.
Variance Analysis
What is Variance Analysis?
Variance analysis compares budgeted figures to actual results and decomposes the differences into actionable drivers. It is the core analytical task in FP&A.
Scenario: You are an FP&A analyst. Q1 actuals just closed. The CEO wants to know why operating income missed budget by $300K.
Revenue: Budget 100K units at $50 = $5M; Actual 95K units at $51 = $4.845M — ($155K) miss
COGS: Budget $30/unit = $3M; Actual $32/unit = $3.04M — ($40K) miss
SG&A: Budget $1.45M; Actual $1.555M — ($105K) miss
Operating Income: Budget $550K; Actual $250K — ($300K) total miss
Variance Analysis with an Agent
With a database agent connected to your financials, one prompt replaces 2–4 hours of FP&A work:
Prompt
“Here is our Q1 budget vs. actuals spreadsheet. Decompose the $300K operating income miss into volume, price, cost, and discretionary spending drivers. Produce a waterfall chart and a summary memo for the CFO.”
What the Agent Does
Plans: identifies the variance decomposition formulas needed
Decomposes revenue into volume + price; COGS into volume + rate
Generates waterfall chart and writes CFO-ready memo in Word format
Finance Application: M&A Due Diligence
M&A Due Diligence with an Agent
Give an agent a goal: “Evaluate this acquisition target.” The agent:
Ingests data from multiple formats (Excel, PDF, Word, CSV)
Applies evaluation criteria and computes risk metrics
Produces a summary memo with flagged risks
This is a single prompt to Claude Code. The agent plans and executes all three steps autonomously, reading multiple file formats and combining the results.
Orchestration
The Orchestration Layer
The agent’s control logic can route different tasks to different models and prompts.
Different Prompts
SQL generation –> database schema prompt
Python analysis –> data science prompt
Each task gets specialized instructions
Different Models
Simple classification –> fast, cheap model
Complex reasoning –> powerful model
Cost and speed optimization
Sub-agents: dispatch specialized workers for parallel tasks. This is what Cowork does with its parallel VMs.
Exercises
Exercise 1: Multi-Step Agent Workflow
Give Claude Code a single compound instruction that requires at least three steps.
Example:“Fetch Apple’s quarterly revenue from the Rice Data Portal. Create a bar chart of the last 8 quarters. Write a one-paragraph executive summary. Save the chart and summary to a reports folder.”
Observe the agent loop — how many distinct tool calls does the agent make?
Does it check its own work?
If the output is not right, refine your prompt and try again
Exercise 2: Workflow Decomposition
Consider the task: “Prepare the quarterly business review for the CEO.”
List the 5–7 steps a human analyst would take to complete this task
For each step, identify what tool the agent would use and what data it would need
Mark which steps need human approval before the agent continues
Write a single prompt that describes the full workflow for an agent
Exercise 3: M&A Due Diligence
Download the due diligence data pack (Excel + PDF + Word + CSV)
Ask Claude Code to evaluate the acquisition target end-to-end
Submit: the summary report + screenshots of intermediate steps
Watch how Claude plans its approach, reads each file, and combines the results into a coherent analysis.
Exercise 4: Streamlit App with the Anthropic API
Get an API key from console.anthropic.com
Ask Claude Code to build a Streamlit app that:
Takes a company ticker from the user
Sends a prompt to the Anthropic API asking for an investment summary
Displays the AI-generated summary on screen
Extend: add a system prompt with specific analysis criteria (valuation, growth, risks)