LAB MANUAL · CCA-F EXAM PREP · v2026.04

Pass Anthropic's Claude Architect exam by building every concept yourself.

Every CCA-F domain as runnable code — with a deliberate failure to trigger and a checklist you can only tick after you've seen it on your own terminal.

Free & open source ~6h 30m hands-on One API key Under $4 in Sonnet 4.6 spend Python 3.11 + Node 20
Finish 8/8 and you've seen every exam behaviour run, fail, and recover on your own machine — not just read about it.
08
Tutorials
6h 30m
Hands-on time
05
Exam domains
$2–4
API spend · Sonnet 4.6

A hands-on CCA-F prep workbook. Independent, not affiliated with or endorsed by Anthropic. All code runs on a clean laptop with an ANTHROPIC_API_KEY exported.

Last verified against anthropic >= 0.40 and Claude Sonnet 4.6 · 2026-04-14 · report an issue
00 / Before you start

Two minutes of setup, then you're doing exam-relevant work.

Qualify the fit, install the toolchain, then dive in. Everything on one laptop — no cluster, no infra account, no SaaS signup beyond Anthropic.

For you if…

  • You're preparing for Anthropic's CCA-F certification and want muscle memory, not just notes.
  • You're comfortable reading Python and using a terminal.
  • You learn better by breaking code and watching the failure than by reading specs.

Skip this if…

  • You don't have (or can't get) an Anthropic API key.
  • You want passive reading without running code — the tutorials are verification-gated on observed behaviour.

One-time install (~2 minutes) — each tutorial creates its own .venv, so Python packages get installed per tutorial. Only these are truly global:

# Ubuntu / Debian (skip on macOS):
sudo apt update && sudo apt install -y python3 python3-venv python3-pip
# All platforms:
sudo npm i -g @anthropic-ai/claude-code
export ANTHROPIC_API_KEY="sk-ant-…"

Never set up a Python virtualenv before? Walk through the venv setup section below — it covers the activate / install / deactivate cycle every tutorial assumes.

00 / Virtual environments

Enabling venv to run the tutorials.

Every tutorial installs its Python packages into a throwaway virtualenv under .venv. That keeps each tutorial's SDK version isolated, avoids polluting your system Python, and sidesteps PEP 668 errors on Ubuntu 23.04+ and current Homebrew. If you've never set one up before, this is the whole dance — do it once on Tutorial 01, then repeat the short version for every later tutorial.

1. Install the venv package (Ubuntu / Debian only)

macOS and most other Linux distros ship the venv module with Python. Ubuntu and Debian split it into a separate apt package that has to be installed once:

bash · one-time
sudo apt update && sudo apt install -y python3 python3-venv python3-pip

2. Create and activate a venv inside the tutorial directory

From the project directory (e.g. agentic-loop for Tutorial 01), create the .venv folder and activate it. After activation, your shell prompt gains a (.venv) prefix — that's the signal the venv is live:

bash · macOS / Linux
python3 -m venv .venv
source .venv/bin/activate

Windows PowerShell: activate with .\.venv\Scripts\Activate.ps1 instead. If PowerShell blocks it on "execution policy", run Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass in that shell first.

3. Install the tutorial's Python packages

With the venv active, pip install writes into .venv/ — not your system Python, no sudo needed. Each tutorial's setup block lists exactly what it needs:

bash
pip install "anthropic>=0.40"

4. Verify you're actually inside the venv

Both python3 and pip should resolve to paths inside .venv. If which points to /usr/bin/, activation silently didn't run — re-source the activate script and re-install:

bash
which python3 pip
# → /path/to/tutorial/.venv/bin/python3
# → /path/to/tutorial/.venv/bin/pip
python3 -c "import anthropic; print(anthropic.__version__)"

5. Deactivate and re-enter in a fresh shell

Activation lives only in the current shell. Close the terminal and it's gone — but the .venv folder persists on disk. The next time you come back to a tutorial, cd into it and re-run the activate line; you don't need to re-create the venv or re-install packages. Explicit deactivate drops an active shell back to system Python without closing it.

bash
deactivate                    # leaves the venv; .venv/ stays on disk
# — later, in a new shell —
cd ~/agentic-loop
source .venv/bin/activate     # re-enter, no re-install needed

Hit an error? ensurepip is not available, externally-managed-environment, and ModuleNotFoundError on an already-installed package all have dedicated entries in troubleshooting — every symptom has a fix and a short reason.

00 / Frequently asked

Before you commit six hours.

The questions people ask about this workbook — up front, so you don't get three tutorials in and wonder.

Is this the official Anthropic certification prep?

No. This is an independent, community-made study resource. It is not affiliated with or endorsed by Anthropic. The CCA-F exam, domain definitions, and certification itself are Anthropic's.

How long does the full workbook take?

About 6 hours 30 minutes of focused time across all eight tutorials — most people spread that across one or two weeks. Each tutorial posts its own time budget in the chip row at the top, and progress saves automatically to localStorage.

How much will the API calls cost?

Approximately $2–4 on Claude Sonnet 4.6 with the iteration caps shown. Every loop has a termination limit to prevent runaway spend — the point of Tutorial 01 is precisely to watch what happens when you remove it. Set a billing alert before you start.

What do I need before I start?

Python 3.11+, Node 20 LTS (for Claude Code), an exported ANTHROPIC_API_KEY, and comfort with a terminal. Everything you need is in the pre-start install block above — no Docker, no cloud accounts, no cluster, everything runs locally.

Will this workbook alone make me pass the exam?

Honestly? Probably not on its own. Exam prep benefits from broader theoretical context alongside hands-on work. This workbook's job is the muscle memory half — every CCA-F domain translated into runnable code with a deliberate failure to trigger, so when you see a scenario on the exam you've already watched the behaviour on your own terminal. Pair it with your preferred theory resource and practice questions, and reserve the final week for revision rather than more tutorials.

What if the code breaks or an SDK version ships?

Open an issue on the GitHub repo. The freshness stamp at the top of the page shows when the code was last verified against a live SDK and model version.

01/08
D1 · Agentic 45 min Python · Anthropic SDK

The Agentic Loop — think / act / observe with the Anthropic SDK

Concept covered. An agent is a Claude instance wrapped in a loop that calls tools, observes results, and decides the next action until a stop condition fires. Every loop needs a termination condition.

Source: Domain 1 · The Agentic Loop mental model

Concept diagram think / act / observe cycle
Agentic loop: user prompt enters a think / act / observe cycle, exits on stop_reason=end_turn or the MAX_ITERATIONS guard. A user prompt feeds the THINK node. When stop_reason is tool_use, ACT runs the tool, OBSERVE appends the result to messages, and control returns to THINK. When stop_reason is end_turn, THINK exits to the Final answer. MAX_ITERATIONS · runaway guard User prompt THINK messages.create() returns stop_reason ACT run_tool(name, args) your side OBSERVE tool_result block append to messages Final answer break user turn stop = tool_use invoke tool_result stop = end_turn

Each turn is one client.messages.create() call. The stop_reason is the gate — tool_use sends you back around the loop, end_turn exits. The outer MAX_ITERATIONS bound is your last-resort seatbelt against runaway spend.

Prerequisites
  • Python 3.11+
  • anthropic Python SDK ≥ 0.40
  • An Anthropic API key exported as ANTHROPIC_API_KEY
  • Nothing else — runs locally, no cluster needed
Learning objectives
  • Implement a think → act → observe → think loop from scratch
  • Observe stop_reason flip from tool_use to end_turn
  • Break the loop by removing the termination cap and watch it run away
  • Verify each turn's tool_use block and tool_result round-trip
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

≈ 2 min

Ubuntu / Debian users: the stdlib venv module needs a one-time package install — run this once, then skip it for every later tutorial:

bash · Ubuntu one-time
sudo apt update && sudo apt install -y python3 python3-venv python3-pip

Now create the project directory and a fresh virtualenv. On macOS and most Linux distros, python3 is the canonical binary — python on its own is not reliable across systems.

bash
mkdir agentic-loop && cd agentic-loop
python3 -m venv .venv && source .venv/bin/activate
pip install "anthropic>=0.40"

Create loop.py:

Lines with a coloured left-stripe are Claude API / agentic-loop code (CCA-F exam content). Unmarked lines are application plumbing you'd swap out for your own logic.

python · loop.py
import anthropic, json, os

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
MAX_ITERATIONS = 5

tools = [{
    "name": "get_weather",
    "description": "Return the current temperature in Celsius for a given city.",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

def run_tool(name, args):
    if name == "get_weather":
        fake = {"Paris": 14, "Tokyo": 22, "Cape Town": 27}
        return {"temp_c": fake.get(args["city"], 20)}
    raise ValueError(f"unknown tool {name}")

messages = [{"role": "user",
             "content": "Compare the weather in Paris and Tokyo right now."}]

for turn in range(MAX_ITERATIONS):
    resp = client.messages.create(
        model=MODEL, max_tokens=1024, tools=tools, messages=messages
    )
    print(f"--- turn {turn} · stop_reason={resp.stop_reason} ---")
    messages.append({"role": "assistant", "content": resp.content})

    if resp.stop_reason == "end_turn":
        print("FINAL:", resp.content[-1].text)
        break

    tool_results = []
    for block in resp.content:
        if block.type == "tool_use":
            result = run_tool(block.name, block.input)
            print(f"  → {block.name}({block.input}) = {result}")
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result),
            })
    messages.append({"role": "user", "content": tool_results})
else:
    print("!! loop exhausted without end_turn — safety cap hit")

Code trace — what happens when you run it

≈ 5 min read

Before you run the script, walk through it in your head. The user asks "Compare the weather in Paris and Tokyo right now." A single get_weather tool is available. The model cannot answer without calling it — twice, once per city — and this is how the three-turn conversation unfolds.

Before the loop

  • anthropic.Anthropic() reads ANTHROPIC_API_KEY from env and constructs an HTTP client. No network call yet.
  • MAX_ITERATIONS = 5 is the termination cap — the single most important line in the file. Without it, for turn in range(MAX_ITERATIONS) becomes while True: and your bill grows without bound. Tutorial 01's build-and-break exercise deletes exactly this line on purpose.
  • tools = [...] is the catalogue the model sees. Three fields matter: name (identifier the model emits), description (free text it reads to decide when to call), and input_schema (JSON Schema enforced server-side on the argument payload).
  • run_tool(...) is a pure-Python dispatcher. The model never executes this — you do. The model only asks you to.
  • messages starts with a single user turn. Everything below gets appended to it; state lives in your Python process, not on Anthropic's side.
  1. Turn 0 — "Think". Model requests the Paris tool call.

    client.messages.create(...) is HTTP call #1. The model reads the user prompt plus the tool catalogue, decides it cannot answer directly, and returns:

    responseCCAF
    stop_reason = "tool_use"
    content     = [ ToolUseBlock(id='toolu_01A…', name='get_weather',
                                 input={'city': 'Paris'}) ]

    The loop appends the assistant turn, then — because stop_reason is not end_turn — falls through into the tool-dispatch block. "Act" and "observe" happen in the same Python iteration: run_tool runs, a tool_result block is built, and it's appended as a new user message.

    Two subtle things. tool_use_id must match the id the model gave you — that's the correlation key. And content has to be a string (or a list of content blocks) — passing a raw dict here is the BadRequestError readers hit first.
  2. Turn 1 — model now knows Paris, still needs Tokyo.

    Same create() call, but messages now carries the Paris tool_result. The model returns another ToolUseBlock(city='Tokyo'). Loop dispatches it, appends the Tokyo result, moves on.

    Parallel tool use. Sonnet sometimes returns both tool-use blocks in turn 0 as an optimisation. The for block in resp.content loop handles that identically — it just dispatches both before the next create() call, and turn 1 never happens.
  3. Turn 2 — "End". Model has both temperatures, composes the answer.

    HTTP call #3. The conversation history now contains both tool results. No tool is needed. The model returns:

    responseCCAF
    stop_reason = "end_turn"
    content     = [ TextBlock(text='Paris is currently 14 °C while Tokyo
                                   is 22 °C, so Tokyo is 8 °C warmer…') ]

    Now the if resp.stop_reason == "end_turn": branch fires. resp.content[-1].text grabs the prose answer (the last block, because earlier blocks might be thinking text), prints it, and break exits the loop entirely — meaning the for/else clause is not executed. else on a for only fires when the loop exhausts without break.

    Why it matters: end_turn is the healthy exit. Iteration-cap exit (the else branch) means something went wrong.

The final messages ledger

When the loop breaks, messages is a 6-element record of the whole conversation:

messages stateCCAF
[0] user       "Compare the weather in Paris and Tokyo right now."
[1] assistant  [ToolUseBlock(name='get_weather', input={'city': 'Paris'})]
[2] user       [tool_result(Paris → {'temp_c': 14})]
[3] assistant  [ToolUseBlock(name='get_weather', input={'city': 'Tokyo'})]
[4] user       [tool_result(Tokyo → {'temp_c': 22})]
[5] assistant  [TextBlock("Paris is 14 °C while Tokyo is 22 °C…")]

Two things worth internalising from this shape:

  • The model has no memory between calls. Every create() re-sends the full messages list. That's why the loop's job is really just to accrete history.
  • Tool results ride inside user messages, not a third role. From the model's perspective, "you told me something" and "the tool told me something" are both inputs from the user role.
One-line mental model

The agentic loop is a three-state machine — think (the create() call), act (your run_tool), observe (appending a tool_result back into messages) — that terminates on stop_reason == "end_turn", an iteration cap, or an error. Every line in loop.py exists to implement one of those three states.

Walkthrough

≈ 15 min
  1. Run the loop once.
    bash
    python loop.py

    Expected: two turns with stop_reason=tool_use (one per city), then a turn with stop_reason=end_turn and a prose answer.

    Why it matters: You can see the Think → Act → Observe cycle explicitly — the API never hides it.
  2. Inspect the first content block each turn.

    Inside the for turn loop, right after the turn-header print, add one line. The full file now looks like this (the new line is highlighted):

    python · loop.py
    import anthropic, json, os
    
    client = anthropic.Anthropic()
    MODEL = "claude-sonnet-4-6"
    MAX_ITERATIONS = 5
    
    tools = [{
        "name": "get_weather",
        "description": "Return the current temperature in Celsius for a given city.",
        "input_schema": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    }]
    
    def run_tool(name, args):
        if name == "get_weather":
            fake = {"Paris": 14, "Tokyo": 22, "Cape Town": 27}
            return {"temp_c": fake.get(args["city"], 20)}
        raise ValueError(f"unknown tool {name}")
    
    messages = [{"role": "user",
                 "content": "Compare the weather in Paris and Tokyo right now."}]
    
    for turn in range(MAX_ITERATIONS):
        resp = client.messages.create(
            model=MODEL, max_tokens=1024, tools=tools, messages=messages
        )
        print(f"--- turn {turn} · stop_reason={resp.stop_reason} ---")
        print(resp.content[0])   # ← added: inspect first content block
        messages.append({"role": "assistant", "content": resp.content})
    
        if resp.stop_reason == "end_turn":
            print("FINAL:", resp.content[-1].text)
            break
    
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = run_tool(block.name, block.input)
                print(f"  → {block.name}({block.input}) = {result}")
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result),
                })
        messages.append({"role": "user", "content": tool_results})
    else:
        print("!! loop exhausted without end_turn — safety cap hit")

    Re-run the script. On turns where stop_reason=tool_use you'll see a structured block — ToolUseBlock(id='toolu_…', input={'city': 'Paris'}, name='get_weather', type='tool_use') — and on the final end_turn turn you'll see a TextBlock(text='…', type='text') instead.

    Why it matters: Tool use is a structured response block, not parsed text. The block type differs by turn, and your loop depends on that difference — this is the contract you're coding against.
  3. Trace the message list after run completes.

    Append one line to the bottom of loop.py, after the for/else block. Full file (both instrumentation lines from steps 2 and 3 are highlighted):

    python · loop.py
    import anthropic, json, os
    
    client = anthropic.Anthropic()
    MODEL = "claude-sonnet-4-6"
    MAX_ITERATIONS = 5
    
    tools = [{
        "name": "get_weather",
        "description": "Return the current temperature in Celsius for a given city.",
        "input_schema": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    }]
    
    def run_tool(name, args):
        if name == "get_weather":
            fake = {"Paris": 14, "Tokyo": 22, "Cape Town": 27}
            return {"temp_c": fake.get(args["city"], 20)}
        raise ValueError(f"unknown tool {name}")
    
    messages = [{"role": "user",
                 "content": "Compare the weather in Paris and Tokyo right now."}]
    
    for turn in range(MAX_ITERATIONS):
        resp = client.messages.create(
            model=MODEL, max_tokens=1024, tools=tools, messages=messages
        )
        print(f"--- turn {turn} · stop_reason={resp.stop_reason} ---")
        print(resp.content[0])   # ← added in step 2: inspect first content block
        messages.append({"role": "assistant", "content": resp.content})
    
        if resp.stop_reason == "end_turn":
            print("FINAL:", resp.content[-1].text)
            break
    
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = run_tool(block.name, block.input)
                print(f"  → {block.name}({block.input}) = {result}")
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result),
                })
        messages.append({"role": "user", "content": tool_results})
    else:
        print("!! loop exhausted without end_turn — safety cap hit")
    print(json.dumps([{'role': m['role']} for m in messages], indent=2))   # ← added in step 3: trace message roles

    Expected: user → assistant → user → assistant → user → assistant with tool_use / tool_result alternating inside the user messages.

    Why it matters: The loop builds the history — the model has no memory between create() calls except what you pass.

Build-and-break exercise

≈ 10 min
INTENTIONAL FAILURE

Starve the loop of iterations

Two small edits to loop.py: lower the iteration cap near the top, and widen the prompt to three cities so the model needs more than one tool-use turn. Full file with both edits highlighted — save over your existing loop.py to run the exercise:

python · loop.py · build-and-break variant
import anthropic, json, os

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
MAX_ITERATIONS = 1          # ← was 5 — deliberately too low

tools = [{
    "name": "get_weather",
    "description": "Return the current temperature in Celsius for a given city.",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

def run_tool(name, args):
    if name == "get_weather":
        fake = {"Paris": 14, "Tokyo": 22, "Cape Town": 27}
        return {"temp_c": fake.get(args["city"], 20)}
    raise ValueError(f"unknown tool {name}")

messages = [{"role": "user",
             "content": "Compare weather in Paris, Tokyo, and Cape Town."}]   # ← was two cities

for turn in range(MAX_ITERATIONS):
    resp = client.messages.create(
        model=MODEL, max_tokens=1024, tools=tools, messages=messages
    )
    print(f"--- turn {turn} · stop_reason={resp.stop_reason} ---")
    messages.append({"role": "assistant", "content": resp.content})

    if resp.stop_reason == "end_turn":
        print("FINAL:", resp.content[-1].text)
        break

    tool_results = []
    for block in resp.content:
        if block.type == "tool_use":
            result = run_tool(block.name, block.input)
            print(f"  → {block.name}({block.input}) = {result}")
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result),
            })
    messages.append({"role": "user", "content": tool_results})
else:
    print("!! loop exhausted without end_turn — safety cap hit")

Expected failure: the loop exits via the else branch with !! loop exhausted without end_turn. The model wanted another turn; you cut it off.

Confirm: the printed stop_reason on the final turn is tool_use, not end_turn.

Revert: restore MAX_ITERATIONS = 5.

Verification checklist

  • I can name the three states of the agentic loop without looking at the diagram.
  • I can point to the line of code that is the termination condition.
  • I've seen stop_reason flip from tool_use to end_turn.
  • I can explain why the model has no memory between create() calls.
  • I've watched the loop exit because of the cap, not the model.

Cleanup

bash
deactivate && rm -rf agentic-loop

Further exploration

  • Add a stop_sequences=["TASK_COMPLETE"] parameter and watch how stop_reason changes.
  • Wire in streaming (client.messages.stream(...)) and see tool-use blocks arrive incrementally.
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. The difference from your baseline is what this tutorial actually cost on Sonnet 4.6 — add it to a running tally so you can compare against the ~$2–4 total budget for the whole workbook.

Optional in-code tally. Drop this line after each client.messages.create(…) call to see per-request token counts in your terminal:

python
print(f"[usage] in={resp.usage.input_tokens} out={resp.usage.output_tokens}")
02/08
D1 · Agentic 60 min Python · Anthropic SDK

Hub-and-Spoke Orchestration with Isolated Sub-agent Context

Concept covered. One orchestrator delegates independent slices of work to sub-agents running in their own context windows. Sub-agents return summaries, not full transcripts. No worker-to-worker edges.

Source: Domain 1 · Hub-and-Spoke model · Scenario 03 (Multi-Agent Research)

Concept diagram orchestrator + isolated sub-agent contexts
Hub-and-spoke: the orchestrator dispatches tasks to sub-agents in isolated context windows and receives summaries back. A central orchestrator holds the full plan. Three sub-agents, each with its own isolated context window, receive task strings and return summaries (not full transcripts). There are no edges between sub-agents. Orchestrator plan · routing · stitch holds the full history sub-agent · isolated context news-searcher sub-agent · isolated context pdf-summarizer sub-agent · isolated context table-extractor task summary no worker-to-worker edges

The orchestrator sees everything; each sub-agent sees only the slice it was given. Sub-agents return summaries, not transcripts — that's how the hub's context stays bounded. Direct edges between workers break the pattern and reintroduce the context-bloat problem hub-and-spoke was built to solve.

Prerequisites
  • Tutorial 01 complete (you understand the single-agent loop)
  • Same Python environment, anthropic SDK
  • Anthropic API key
Learning objectives
  • Implement spawn_subagent(task_brief) → summary
  • Verify parent context does NOT contain sub-agent transcripts
  • Trigger the leaky-context bug by forwarding a child's full history
  • Measure token counts on parent vs child to see the isolation boundary
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

Reuse the .venv from Tutorial 01. Create orchestrator.py:

python · orchestrator.py
import anthropic
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"

def spawn_subagent(task_brief: str, subagent_name: str) -> str:
    """Run an isolated sub-agent; return a summary string only."""
    print(f"  [spawn] {subagent_name}: {task_brief[:60]}…")
    resp = client.messages.create(
        model=MODEL, max_tokens=512,
        system=(f"You are a {subagent_name} sub-agent. Do the task briefly and "
                "return ONE paragraph of findings. Do not narrate your process."),
        messages=[{"role": "user", "content": task_brief}],
    )
    summary = resp.content[0].text
    print(f"  [done]  {subagent_name}: {len(summary)} chars, "
          f"{resp.usage.output_tokens} tokens")
    return summary

def orchestrate(user_goal: str) -> str:
    plan = [
        ("researcher", f"Find 3 key facts relevant to: {user_goal}"),
        ("critic",     f"List 2 likely counter-arguments to: {user_goal}"),
        ("synthesizer",f"Suggest a 1-sentence thesis for: {user_goal}"),
    ]
    findings = {name: spawn_subagent(brief, name) for name, brief in plan}

    resp = client.messages.create(
        model=MODEL, max_tokens=512,
        system="You are an orchestrator. Combine sub-agent findings into a brief.",
        messages=[{"role": "user",
                   "content": f"Goal: {user_goal}\n\n"
                              + "\n\n".join(f"[{k}]\n{v}" for k,v in findings.items())}],
    )
    print(f"[orchestrator] {resp.usage.input_tokens} in, "
          f"{resp.usage.output_tokens} out")
    return resp.content[0].text

if __name__ == "__main__":
    print(orchestrate("Should a startup adopt Kubernetes on day one?"))

Code trace — what happens when you run it

≈ 4 min read

orchestrate() fires four independent create() calls — three sub-agent spawns, then one synthesis at the hub. None of the sub-agents share a conversation with each other or with the orchestrator. That isolation is the entire pattern.

Before any API call

  • The plan list is a static 3-tuple of (role, task_brief). Sub-agent identity lives in the system prompt, not in the history — which is why each child starts fresh.
  • findings = {name: spawn_subagent(brief, name) for name, brief in plan} runs the three spawns sequentially. To run them in parallel, swap in the async client and asyncio.gather (see Further exploration).
  1. Call 1 — Researcher spawn.

    spawn_subagent("Find 3 key facts…", "researcher") makes a fresh create(). The child's system is "You are a researcher sub-agent. Do the task briefly and return ONE paragraph…" — that line is the summary contract. messages contains only the task brief. Response is a one-paragraph TextBlock; resp.content[0].text is pulled out and returned as a plain string.

    What doesn't happen: the researcher never sees the user's original goal. It only sees "Find 3 key facts relevant to: Should a startup adopt Kubernetes…". The parent pre-framed the brief.
  2. Call 2 — Critic spawn. Call 3 — Synthesizer spawn.

    Identical shape, different subagent_name and different task brief. Each gets its own empty messages list and its own system prompt. The critic has no idea what the researcher said; the synthesizer has no idea what either said. Each returns one paragraph (or one sentence, in the synthesizer's case).

    Why that isolation matters: if the critic had seen the researcher's findings, its counter-arguments would be biased toward those specific facts. Fresh context = genuinely independent critique.
  3. Call 4 — Orchestrator synthesis at the hub.

    The last create() uses a different system prompt ("You are an orchestrator…") and a constructed user message that concatenates all three summaries — not transcripts. The orchestrator never sees how the critic reasoned, only its one-paragraph conclusion.

    Why it matters: this is where the token savings show up. Print resp.usage.input_tokens and you'll see ~500–700 for the orchestrator — three summaries. If you passed full transcripts, it would be 3× to 10× larger. At N=10 sub-agents, that difference is the whole tutorial.
One-line mental model

Hub-and-spoke = N isolated conversations plus one synthesis conversation. Workers never talk to each other; the hub only ever receives summaries. Every deviation from that shape — leaking transcripts, adding worker-to-worker edges — is an anti-pattern from Domain 1.

Walkthrough

  1. Run the orchestrator.
    bash
    python orchestrator.py

    Expected: three [spawn]/[done] pairs, then an [orchestrator] token report.

    Why it matters: Each sub-agent starts with only its own system prompt and task brief. The parent never sees their internal reasoning.
  2. Compare token budgets.

    Extend the existing [done] line inside spawn_subagent so it also reports input_tokens. Full file with the change highlighted:

    python · orchestrator.py
    import anthropic
    client = anthropic.Anthropic()
    MODEL = "claude-sonnet-4-6"
    
    def spawn_subagent(task_brief: str, subagent_name: str) -> str:
        """Run an isolated sub-agent; return a summary string only."""
        print(f"  [spawn] {subagent_name}: {task_brief[:60]}…")
        resp = client.messages.create(
            model=MODEL, max_tokens=512,
            system=(f"You are a {subagent_name} sub-agent. Do the task briefly and "
                    "return ONE paragraph of findings. Do not narrate your process."),
            messages=[{"role": "user", "content": task_brief}],
        )
        summary = resp.content[0].text
        print(f"  [done]  {subagent_name}: {len(summary)} chars, "
              f"in={resp.usage.input_tokens} out={resp.usage.output_tokens} tokens")   # ← added in=
        return summary
    
    def orchestrate(user_goal: str) -> str:
        plan = [
            ("researcher", f"Find 3 key facts relevant to: {user_goal}"),
            ("critic",     f"List 2 likely counter-arguments to: {user_goal}"),
            ("synthesizer",f"Suggest a 1-sentence thesis for: {user_goal}"),
        ]
        findings = {name: spawn_subagent(brief, name) for name, brief in plan}
    
        resp = client.messages.create(
            model=MODEL, max_tokens=512,
            system="You are an orchestrator. Combine sub-agent findings into a brief.",
            messages=[{"role": "user",
                       "content": f"Goal: {user_goal}\n\n"
                                  + "\n\n".join(f"[{k}]\n{v}" for k,v in findings.items())}],
        )
        print(f"[orchestrator] {resp.usage.input_tokens} in, "
              f"{resp.usage.output_tokens} out")
        return resp.content[0].text
    
    if __name__ == "__main__":
        print(orchestrate("Should a startup adopt Kubernetes on day one?"))

    Re-run and watch the numbers: each sub-agent uses 50–150 input tokens; the orchestrator's [orchestrator] line shows ~600 (three summaries).

    Why it matters: Without the hub-and-spoke boundary, passing full transcripts up would 3–10× the parent's token bill.
  3. Observe the summary contract.

    The sub-agent's system prompt says "return ONE paragraph… do not narrate." Sub-agents return artifacts, not history.

    Why it matters: This is the summary-not-state rule from the source, enforced at the prompt layer.

Build-and-break exercise

LEAKY CONTEXT BUG

Forward the full child transcript up to the parent

Two edits to orchestrator.py: add a leaky variant of spawn_subagent, then swap the orchestrator's findings line to use it and flatten transcripts into the parent's history. Full file with both edits highlighted — save over your existing orchestrator.py:

python · orchestrator.py · build-and-break variant
import anthropic
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"

def spawn_subagent(task_brief: str, subagent_name: str) -> str:
    """Run an isolated sub-agent; return a summary string only."""
    print(f"  [spawn] {subagent_name}: {task_brief[:60]}…")
    resp = client.messages.create(
        model=MODEL, max_tokens=512,
        system=(f"You are a {subagent_name} sub-agent. Do the task briefly and "
                "return ONE paragraph of findings. Do not narrate your process."),
        messages=[{"role": "user", "content": task_brief}],
    )
    summary = resp.content[0].text
    print(f"  [done]  {subagent_name}: {len(summary)} chars, "
          f"{resp.usage.output_tokens} tokens")
    return summary

def spawn_subagent_LEAKY(task_brief, subagent_name):                              # ← new leaky function
    resp = client.messages.create(
        model=MODEL, max_tokens=512,
        system=f"You are a {subagent_name} sub-agent. Do the task briefly.",
        messages=[{"role": "user", "content": task_brief}],
    )
    return {"messages": [{"role": "user", "content": task_brief},
                         {"role": "assistant", "content": resp.content[0].text}]}

def orchestrate(user_goal: str) -> str:
    plan = [
        ("researcher", f"Find 3 key facts relevant to: {user_goal}"),
        ("critic",     f"List 2 likely counter-arguments to: {user_goal}"),
        ("synthesizer",f"Suggest a 1-sentence thesis for: {user_goal}"),
    ]
    # findings = {name: spawn_subagent(brief, name) for name, brief in plan}      # ← commented out
    findings = [spawn_subagent_LEAKY(brief, name) for name, brief in plan]        # ← new (leaky)
    flattened = sum((f["messages"] for f in findings), [])                        # ← new

    resp = client.messages.create(
        model=MODEL, max_tokens=512,
        system="You are an orchestrator. Combine sub-agent findings into a brief.",
        messages=flattened,                                                       # ← was the findings dict
    )
    print(f"[orchestrator] {resp.usage.input_tokens} in, "
          f"{resp.usage.output_tokens} out")
    return resp.content[0].text

if __name__ == "__main__":
    print(orchestrate("Should a startup adopt Kubernetes on day one?"))

Expected failure: the orchestrator's input_tokens roughly triples. With N=10 sub-agents, the parent explodes its budget and starts losing information in the middle (Domain 5's failure mode).

Confirm: print resp.usage.input_tokens in both versions and compare.

Revert: restore the original spawn_subagent.

Verification checklist

  • I can explain why sub-agents return summaries, not transcripts.
  • I've seen the parent's token count stay small in the isolated version.
  • I can identify a mesh anti-pattern vs a hub-and-spoke.
  • I've run the same orchestration sequentially vs in parallel (see Further exploration).
  • I know the first question before splitting: is the work actually independent?

Cleanup

bash
rm orchestrator.py

Further exploration

  • Wrap the three spawn calls in asyncio.gather using the async client — parallelize for wall-clock savings.
  • Add a max_subagents guard and a circuit-breaker for failing sub-agents.
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. The difference from your baseline is what this tutorial actually cost on Sonnet 4.6 — add it to a running tally so you can compare against the ~$2–4 total budget for the whole workbook.

Optional in-code tally. Drop this line after each client.messages.create(…) call to see per-request token counts in your terminal:

python
print(f"[usage] in={resp.usage.input_tokens} out={resp.usage.output_tokens}")
03/08
D2 · Tools / MCP 75 min Python · MCP SDK · Claude Code

Build an MCP Server Exposing Tools, Resources, and Prompts

Concept covered. MCP exposes three primitive types from one server. Tools are model-invoked actions. Resources are app-controlled data. Prompts are user-triggered templates. Picking the wrong primitive is the single most common Domain 2 exam trap.

Source: Domain 2 · MCP Triangle · Flashcards (Tool / Resource / Prompt Template)

Concept diagram host · protocol · server with the three primitives
MCP architecture: an LLM host speaks JSON-RPC to an MCP server that exposes Tools, Resources, and Prompts. The host (Claude Code or Claude Desktop) connects via JSON-RPC over stdio or HTTP to an MCP server that exposes three primitives: Tools (callable), Resources (readable), and Prompts (templates). Host LLM application Claude Code Claude Desktop MCP client inside JSON-RPC 2.0 stdio · HTTP · SSE request response MCP Server TOOLS · model-controlled callable functions search · write_file · sql side-effects, actions RESOURCES · app-controlled readable data file://logs · repo://readme pulled by URI PROMPTS · user-controlled reusable templates /summarize · /triage invoked by slash

Three primitives, three control surfaces. Tools are model-controlled — the LLM decides when to invoke. Resources are app-controlled — the host pulls them by URI. Prompts are user-controlled — surfaced as slash commands. Mixing up who controls what is Domain 2's most common exam trap.

Prerequisites
  • Python 3.11+
  • mcp package (official Model Context Protocol SDK): pip install mcp
  • Claude Code CLI installed (sudo npm i -g @anthropic-ai/claude-code)
  • Basic familiarity with JSON Schema
Learning objectives
  • Stand up an MCP server that exposes all three primitives
  • Wire it into Claude Code as a stdio MCP server
  • Invoke each primitive and see which initiator triggers it (model / app / user)
  • Deliberately mis-categorise a feature to see the model pick the wrong primitive
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

bash
mkdir mcp-triangle && cd mcp-triangle
python3 -m venv .venv && source .venv/bin/activate
pip install "mcp>=1.0"

Create server.py:

python · server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("triangle-demo")

# --- TOOL: model decides to call ------------------------------------------
@mcp.tool()
def lookup_order(order_id: str) -> dict:
    """Fetch current status of a customer order. Call only when the user
    references an order ID or asks 'where is my order'. Do not call for
    general product questions."""
    fake = {"A-100": {"status": "shipped"}, "A-200": {"status": "processing"}}
    return fake.get(order_id, {"status": "not_found"})

# --- RESOURCE: app preloads, model reads -----------------------------------
@mcp.resource("docs://shipping-policy")
def shipping_policy() -> str:
    """Our shipping policy document. Read when explaining delivery times."""
    return "Orders ship in 2 business days. International adds 5–7 days."

# --- PROMPT: user types /refund ------------------------------------------
@mcp.prompt("refund-flow")
def refund_flow(order_id: str) -> str:
    """Walk the user through a refund for a specific order."""
    return (f"Start a refund flow for order {order_id}. Verify eligibility, "
            "confirm with the user, then call lookup_order to get status.")

if __name__ == "__main__":
    mcp.run()   # defaults to stdio transport

Wire it into Claude Code. Create .mcp.json in the project root:

json · .mcp.json
{
  "mcpServers": {
    "triangle": {
      "command": "python",
      "args": ["server.py"]
    }
  }
}

Code trace — the three initiator paths

≈ 5 min read

Unlike earlier tutorials there is no single linear flow here. server.py registers three primitives that are invoked by three different initiators. Each one follows a completely different path through the system — that's the whole lesson.

Before any user interaction

  • FastMCP("triangle-demo") constructs the server object. The three decorators (@mcp.tool(), @mcp.resource(...), @mcp.prompt(...)) fire at import time — they introspect the Python functions, pull the docstrings out as descriptions, and register each primitive with the server's capability table.
  • mcp.run() starts the stdio event loop. The server now blocks on stdin waiting for a JSON-RPC frame from Claude Code.
  • When Claude Code launches in this directory, it reads .mcp.json, spawns python server.py as a subprocess, and handshakes with it over that subprocess's stdin/stdout. After the handshake, Claude Code knows the server offers exactly: 1 tool, 1 resource, 1 prompt — visible when you type /mcp.
  1. TOOL — the model initiates.

    You ask "What's the status of order A-100?". Claude's next response has stop_reason=tool_use and a ToolUseBlock(name='lookup_order', input={'order_id': 'A-100'}). Claude Code forwards that to the MCP server over stdio as a tools/call JSON-RPC frame. The server routes to the decorated Python function, executes it, gets {'status': 'shipped'}, sends that back as a tools/call result. Claude Code wraps the result as a tool_result content block and appends it to the conversation — the standard agentic loop picks up from there.

    Key point: the model chose to call the tool. The description ("Call only when the user references an order ID…") is the routing signal. Claude Code could not call the tool on its own — it only executes what the model requests.
  2. RESOURCE — the application initiates.

    You ask "Explain our shipping timeline.". Claude Code's harness decides that the docs://shipping-policy resource is relevant to the task and issues a resources/read JSON-RPC call to the server. The shipping_policy() function returns its string, Claude Code inlines it into the prompt context (as a content block attached to the user message), and the model just reads. The model never emits a tool-use block; it never decided to call anything.

    Key point: resources are attached, not called. That's why "mis-categorising a Resource as a Tool" — the build-and-break exercise — produces wrong-time calls and wasted tokens: you've shifted the initiator from the app to the model.
  3. PROMPT — the user initiates.

    You type /triangle:refund-flow. Claude Code intercepts the slash before it reaches the model, looks up the refund-flow prompt on the triangle server via prompts/get, receives the template string "Start a refund flow for order <id>…", expands any arguments, and submits the result to Claude as if you had typed it. From the model's perspective this is just a user message — it then proceeds normally, likely calling lookup_order inside the flow.

    Key point: prompts are user-authored shortcuts. They don't change model behaviour — they change what the user has to type. The model's routing logic is unchanged.
One-line mental model

Tool = model decides. Resource = app decides. Prompt = user decides. Every MCP design choice reduces to: who should be in control of firing this? The exam trap (and this tutorial's build-and-break) is putting a thing under the wrong initiator.

Walkthrough

  1. Verify the server boots.
    bash
    python server.py < /dev/null

    It should start and wait for stdio input — kill with Ctrl-C.

    Why it matters: Stdio is the transport Claude Code uses for local MCP servers.
  2. Launch Claude Code at this directory.
    bash
    claude

    At the prompt, type /mcp. Expected: triangle appears with 1 tool, 1 resource, 1 prompt.

    Why it matters: The CLI auto-loads .mcp.json and introspects the server.
  3. Trigger the TOOL (model-initiated).

    Ask Claude: What's the status of order A-100? Expected: Claude calls lookup_order and answers "shipped."

    Why it matters: The model chose to call the tool because the description told it when to.
  4. Trigger the RESOURCE (app-controlled).

    Ask: Explain our shipping timeline. In a well-behaved harness, the resource is attached; Claude reads it to answer.

    Why it matters: Resources aren't "chosen" by the model — the app decides to include them.
  5. Trigger the PROMPT (user-initiated).

    Type /triangle:refund-flow, then answer the order ID question.

    Why it matters: Prompts are slash-command templates — the human decides when to fire them.

Build-and-break exercise

MIS-CATEGORISE A PRIMITIVE

Convert the shipping-policy Resource into a Tool

Rewrite the shipping policy as a Tool called get_shipping_policy() with a minimal description: "Returns shipping policy text.".

Then ask Claude a non-shipping question: What's our return policy?

Expected failure modes (one or more):

  • Claude calls get_shipping_policy pointlessly (because "policy" matched loosely).
  • It doesn't call the resource-that-should-be-there at all.
  • Token use balloons because the full document is injected into each tool result.

Confirm: watch the Claude Code transcript — you'll see the spurious get_shipping_policy call. A resource would have been quietly attached; a tool demands a reason to call.

Revert: restore the @mcp.resource decorator.

Verification checklist

  • I can state "who initiates?" for each of the three primitives without looking.
  • I've seen a tool mis-used because its description was too loose.
  • I can point to my .mcp.json and explain stdio vs HTTP/SSE transport.
  • My MCP server exposes one of each primitive and all register in /mcp.
  • I've watched Claude choose (or not) to call a tool based on its description.

Cleanup

bash
deactivate && cd .. && rm -rf mcp-triangle

Further exploration

  • Convert the server to HTTP transport (mcp.run(transport="http", port=3333)) and connect from a remote host.
  • Add a second tool with an anti-example in the description ("Do NOT call when…").
  • Explore the resources/list MCP method directly with mcp-cli or curl.
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. This tutorial doesn't call the Anthropic API directly from Python — Claude Code makes the requests on your behalf — so the Console is your only live signal for spend. Note the delta and add it to the running tally.

04/08
D2 · Tools / MCP 30 min Python · Anthropic SDK

Tool Description Discipline — Break It, Then Fix It

Concept covered. The tool description is the only signal the model has for when to call a tool. Vague descriptions cause wrong-time calls; great descriptions state parameters, units, side effects, and anti-examples ("when NOT to call").

Source: Domain 2 · Common Traps · Quiz Q7

Concept diagram vague vs. disciplined descriptions · same prompt, different pick
Vague tool descriptions cause wrong tool selection; disciplined descriptions cause correct selection. Left: two tools with one-word descriptions cause the model to pick the wrong tool. Right: same two tools with trigger-words, contexts, and non-examples cause the model to pick the right one. SAME USER PROMPT "Find what to wear in Paris today." BEFORE · vague descriptions search "Search." no trigger-words · no scope · no non-examples get_weather "Weather." model cannot tell this handles "wear" semantics Model picks: search() burns a round-trip, returns unrelated results AFTER · disciplined descriptions search "Web search for news & docs …" NOT for weather, prices, or local data get_weather "Temp °C + conditions for a city …" triggers: weather, temperature, what to wear Model picks: get_weather() one call, right semantics, exits fast

A tool description is a contract with the router. The model picks by semantic match, not by function name. Include trigger-words, scope boundaries, and explicit non-examples — e.g. "NOT for weather" — or an adjacent tool will swallow the call.

Prerequisites
  • Tutorial 01 complete
  • Python, anthropic SDK, API key
Learning objectives
  • Run the same prompt against vague vs. precise tool descriptions
  • Count incorrect tool invocations per description
  • Observe how "when NOT to call" guidance reduces false positives
  • Recognise the anti-pattern of renaming a tool to "refresh attention"
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

Reuse Tutorial 01's environment. Create describe.py:

python · describe.py
import anthropic, json
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"

VAGUE = [{
    "name": "get_data",
    "description": "Gets data.",
    "input_schema": {"type": "object", "properties": {}},
}]

PRECISE = [{
    "name": "get_latest_order_status",
    "description": (
        "Fetch the current shipping status of a CUSTOMER ORDER from the orders "
        "database. Use ONLY when the user asks about a specific order by ID or "
        "says 'where is my package'. DO NOT use for returns, refunds, account "
        "info, or general product questions. Returns: {order_id, status, eta}."
    ),
    "input_schema": {
        "type": "object",
        "properties": {"order_id": {"type": "string",
                                    "description": "Order ID, e.g. 'A-100'"}},
        "required": ["order_id"],
    },
}]

test_prompts = [
    "Where is my order A-100?",           # should call
    "What's your return policy?",         # should NOT call
    "Tell me about the Widget Pro specs", # should NOT call
    "Is my package A-200 on its way?",    # should call
]

def run(tools, prompts, label):
    print(f"\n=== {label} ===")
    for p in prompts:
        resp = client.messages.create(
            model=MODEL, max_tokens=256, tools=tools,
            messages=[{"role": "user", "content": p}],
        )
        called = [b.name for b in resp.content if b.type == "tool_use"]
        print(f"  prompt: {p!r:50s} → tool_calls={called}")

run(VAGUE,   test_prompts, "VAGUE tool description")
run(PRECISE, test_prompts, "PRECISE tool description")

Code trace — what the comparison measures

≈ 3 min read

This script is a two-batch experiment. It fires the same four prompts against two tool catalogues — one deliberately vague, one deliberately precise — and records which prompt triggers a tool call. Eight API calls total. The single variable is the tool's description.

Before either batch

  • The test_prompts list is labelled in the source comments: two should call (order-ID lookups), two should not (policy + product questions). That label is the ground truth you'll measure accuracy against.
  • VAGUE has a single sentence description ("Gets data.") and an empty input_schema. PRECISE names the domain, the trigger phrases, the anti-cases, the parameter with its own description, and the return shape — five elements, each of which is a signal the model can route on.
  • run(tools, prompts, label) is the harness: one create() per prompt, with a fresh single-turn messages list each time — no conversation carry-over. The log line extracts [b.name for b in resp.content if b.type == "tool_use"], so an empty list means "model decided not to call".
  1. Batch 1 — VAGUE. Four calls, four unreliable outcomes.

    For each prompt, the model sees a tool named get_data with no signal about when to use it. Two failure modes appear:

    • Over-calling. The model guesses "the user is asking something, I have a tool, I should try it" and calls get_data on the policy and product prompts too.
    • Under-calling. The model decides the tool is unlikely to help and answers from general knowledge — including the order-status prompts it should have delegated.

    Either way, the tool_calls= column is inconsistent run-to-run. You'll see different outputs on successive runs of the same prompts — the vague description has no anchor.

    Why it matters: unreliable routing is the real cost of vague descriptions — not that the tool "won't fire", but that you can't predict when it will.
  2. Batch 2 — PRECISE. Four calls, four predictable outcomes.

    Same four prompts, but now the description contains "Use ONLY when the user asks about a specific order by ID" and "DO NOT use for returns, refunds, account info, or general product questions.". The model parses these as explicit routing rules:

    • "Where is my order A-100?" — matches the trigger → calls get_latest_order_status(order_id='A-100').
    • "What's your return policy?" — matches the DO NOT list → no call. Model answers from its own knowledge.
    • "Tell me about the Widget Pro specs" — general product question, matches DO NOT → no call.
    • "Is my package A-200 on its way?" — matches the "where is my package" trigger phrase verbatim → calls with order_id='A-200'.

    The result table is now stable across runs — the description is doing the routing work.

    Why it matters: the description is effectively your tool's API contract with the model. Vagueness there is the same bug as a vague interface in code.
One-line mental model

Write tool descriptions the way you'd write instructions for a new hire. Five elements: domain, parameters with descriptions, when to use, when NOT to use, return shape. Skip any of them and the model fills the gap with a guess.

Walkthrough

  1. Run both versions side by side.
    bash
    python describe.py

    Expected: the vague get_data is either called on everything or nothing — both wrong. The precise tool fires only on the two order-status prompts.

    Why it matters: Every spurious tool call is latency and cost — and a context-pollution risk.
  2. Count false positives.

    Tally: how often did each description call the tool when it shouldn't have?

    Why it matters: This is the measurement that separates "works" from "works reliably."
  3. Dissect the precise description.

    Re-read it. It includes: (i) domain ("customer order"), (ii) parameters with descriptions, (iii) an explicit "use ONLY when," (iv) an explicit "DO NOT use for," (v) a return shape.

    Why it matters: This is the concrete answer to Quiz Q7 — five elements every production tool description should have.

Build-and-break exercise

CARGO-CULT RENAME

Rename the tool to "refresh attention"

Rename get_data to get_data_v2 while keeping the description at "Gets data." Run again.

Expected: behaviour is unchanged. The same two failure patterns persist. The name alone is not information.

Confirm: compare tool-call counts before and after the rename. They'll be statistically identical.

Revert: switch back to the precise description.

Verification checklist

  • I have a side-by-side count of tool calls for vague vs precise.
  • I can list the five elements of a precise tool description.
  • I've seen that renaming a tool does not change model behaviour.
  • I understand why "gets data" is not a tool description.
  • I can write a description that includes a "DO NOT call when…" clause.

Cleanup

bash
rm describe.py

Further exploration

  • Add a third variant with parameters-only description and measure which field matters most.
  • Test with claude-haiku-4-5 and claude-opus-4-7 — does description quality matter more for smaller models?
  • Read Anthropic's tool-use documentation for the canonical description pattern.
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. The difference from your baseline is what this tutorial actually cost on Sonnet 4.6 — add it to a running tally so you can compare against the ~$2–4 total budget for the whole workbook.

Optional in-code tally. Drop this line after each client.messages.create(…) call to see per-request token counts in your terminal:

python
print(f"[usage] in={resp.usage.input_tokens} out={resp.usage.output_tokens}")
05/08
D3 · Claude Code 40 min Claude Code CLI

CLAUDE.md Hierarchy in Claude Code

Concept covered. Claude Code loads CLAUDE.md instructions at three tiers — global (~/.claude/CLAUDE.md), project (./CLAUDE.md), and subdirectory (./src/foo/CLAUDE.md). More specific wins. Personal preferences go global; team rules go in project; local overrides go in subdirs.

Source: Domain 3 · CLAUDE.md Pyramid · Quiz Q13

Concept diagram three-layer merge · deepest wins
CLAUDE.md hierarchy: user global, project, and nested files merge with precedence from top to bottom. Three layers of CLAUDE.md stack: the user global file applies everywhere, the project file applies in the repo, and nested files apply within subdirectories. The deepest file wins on conflicts. All three merge into the effective session context. LAYER 1 · user global ~/.claude/CLAUDE.md applies to every project on this machine e.g. "prefer pytest · tabs 2" LAYER 2 · project <repo>/CLAUDE.md applies when cwd is inside the repo e.g. "output ONE html file" LAYER 3 · nested · highest precedence <repo>/src/api/CLAUDE.md applies when cwd is inside src/api/ e.g. "FastAPI · return Pydantic" BROAD NARROW deepest wins Effective session context pytest, tabs 2 from user global output ONE html file from project (overrides none above) FastAPI · return Pydantic from nested (overrides parent) CONFLICT RESOLUTION if global says "Flask" and nested says "FastAPI" → FastAPI wins

Claude Code loads all applicable CLAUDE.md files on start-up and merges them. Scope narrows as you descend — global affects every project, project affects the repo, nested affects a subfolder. On conflicts, the deepest file wins. Use user global for personal style, project for repo invariants, nested for per-directory overrides.

Prerequisites
  • Claude Code CLI installed
  • A throwaway directory
  • Your existing ~/.claude/CLAUDE.md will be temporarily displaced (we'll back it up)
Learning objectives
  • Observe three CLAUDE.md files loading simultaneously
  • Force a conflict and see which tier wins
  • Identify what belongs at each tier (team / personal / local)
  • Break the hierarchy by putting a user preference in a project file
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

bash
# Back up existing global CLAUDE.md if any
cp ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.bak 2>/dev/null || true

mkdir -p ~/hierarchy-demo/src/payments
cd ~/hierarchy-demo

Create three tiered files:

bash · tiered CLAUDE.md files
cat > ~/.claude/CLAUDE.md <<'EOF'
# Personal preferences
- Always use emoji-free commit messages.
- When writing code, prefer 4-space indentation.
EOF

cat > ~/hierarchy-demo/CLAUDE.md <<'EOF'
# Team rules for hierarchy-demo
- All Python code must use type hints.
- Do not add dependencies without asking first.
EOF

cat > ~/hierarchy-demo/src/payments/CLAUDE.md <<'EOF'
# Payments module overrides
- This module uses 2-space indentation (legacy).
- Never log raw card numbers, even in tests.
EOF

Code trace — how Claude Code resolves the hierarchy

≈ 3 min read

There's no Python to execute here — but Claude Code runs a deterministic scope-walk algorithm every time it starts a session and every time its working scope changes. Once you know that algorithm, the "which rule wins?" question answers itself.

The three files you just created

  • ~/.claude/CLAUDE.md (global tier) — personal preferences. Follows you, not the repo.
  • ~/hierarchy-demo/CLAUDE.md (project tier) — team rules. Checked into source control; every clone inherits them.
  • ~/hierarchy-demo/src/payments/CLAUDE.md (subdirectory tier) — local overrides for a specific module. Only in effect while Claude is working inside that subtree.
  1. Phase 1 — session start at the project root.

    When you run claude inside ~/hierarchy-demo, Claude Code walks from the current working directory up to home looking for CLAUDE.md files, then adds the global one. Each file found is loaded into the session's instruction context. The src/payments/CLAUDE.md file is not loaded yet — the harness hasn't been asked to look inside that subdirectory.

    So right now: two tiers are active. Claude knows "emoji-free commits, 4-space indent, type hints required, ask before adding dependencies." That's the state when you ask "What instructions are loaded for you right now?"
  2. Phase 2 — you ask for an edit in src/payments/.

    The moment Claude Code's file-op scope shifts into src/payments/ — typically because of an Edit or Write tool call targeting a file there — the harness re-walks and finds src/payments/CLAUDE.md. It gets added to the context as the most specific tier.

    Now there's a conflict: global says 4-space, subdir says 2-space. The resolution rule is most specific wins for this scope — subdir > project > global. Claude uses 2-space inside src/payments/, 4-space everywhere else.

    Why it's scope-based, not merge-at-startup: a monorepo can have dozens of subdirectory CLAUDE.md files. Loading all of them at session start would bloat context. They're loaded on demand as Claude actually works in those paths.
  3. Phase 3 — you move back to the repo root.

    Edit a file outside src/payments/. The subdir rules lose scope; project + global remain. The 2-space rule no longer applies, and "Never log raw card numbers" is no longer in-context either — which is exactly what you want. A local rule about one module shouldn't bleed into unrelated files.

    Wrong-tier failure mode: the build-and-break exercise puts "I like verbose commit messages" into the project file. Because that tier is checked in, every teammate inherits your preference. The pyramid's "Don't" — personal preferences must never leave the global tier.
One-line mental model

Closest-CLAUDE.md-to-the-work wins. Global follows you. Project follows the repo. Subdir follows a specific path. Content belongs at the tier whose audience matches the rule's scope — that's the only design rule, and every anti-pattern violates it.

Walkthrough

  1. Launch Claude Code at the project root.
    bash
    cd ~/hierarchy-demo && claude

    Ask: What instructions are loaded for you right now? Expected: Claude cites rules from both global and project CLAUDE.md — but NOT the payments one (you're not in that path).

    Why it matters: Subdir CLAUDE.md only loads when Claude is working in that subdir.
  2. Work in the payments subdir.

    Ask: Let's edit src/payments/charge.py — create it with a small function. Then ask: Which indentation rule applies here?

    Expected: "2-space, because we're in src/payments and the local CLAUDE.md overrides the global 4-space preference."

    Why it matters: This is the precedence-by-specificity rule from the pyramid diagram.
  3. Force a conflict.

    Edit the project CLAUDE.md to say - Prefer tabs everywhere and ask Claude to write a function in the repo root.

    Your global says 4-space; project says tabs. Expected: Claude follows the project (more specific wins for this scope).

    Why it matters: Teams encode rules closer to the code; personal defaults are overridden where teams have spoken.

Build-and-break exercise

WRONG-TIER RULE

Put a personal preference in the project file

bash
cat >> ~/hierarchy-demo/CLAUDE.md <<'EOF'
- I like verbose commit messages with rationale in every line.
EOF

Commit it. Imagine a teammate cloning the repo — they'd inherit your commit-message preference, never having asked for it. The source's "Don't" explicitly warns against this.

Confirm: ask Claude Who is "I" in this CLAUDE.md? Does this rule belong here? Claude should flag it as a user-scoped rule in the wrong tier.

Revert: remove the offending line and restore your real global CLAUDE.md: mv ~/.claude/CLAUDE.md.bak ~/.claude/CLAUDE.md

Verification checklist

  • I can name what belongs at global / project / subdir tiers.
  • I've seen Claude cite rules from multiple CLAUDE.md files.
  • I've seen a subdir override a project rule.
  • I understand why "I prefer X" should not go in a checked-in CLAUDE.md.
  • I can explain why this is a hierarchy, not a flat config.

Cleanup

bash
rm -rf ~/hierarchy-demo

Further exploration

  • Use /memory in Claude Code to see which CLAUDE.md files are currently loaded.
  • Add a fourth tier via a CLAUDE.local.md file (untracked personal overrides per repo).
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. This tutorial doesn't call the Anthropic API directly from Python — Claude Code makes the requests on your behalf — so the Console is your only live signal for spend. Note the delta and add it to the running tally.

06/08
D3 · Claude Code 45 min Claude Code CLI · jq

Hooks vs Skills vs Slash Commands

Concept covered. Claude Code gives you three automation mechanisms, differing by who initiates. Hooks fire deterministically on harness events. Skills are capabilities the model chooses to use. Slash commands are shortcuts the user types. Pick by initiator, not by what you want to do.

Source: Domain 3 · Flashcards · Quiz Q3 and Q8

Concept diagram three extension mechanisms · three different triggers
Hooks, Skills, and Slash Commands differ by who triggers them. Hooks run on Claude Code events. Skills are loaded when the model invokes the Skill tool. Slash commands run when the user types a slash phrase. TRIGGER · EVENT tool-use · stop · ... TRIGGER · MODEL Skill tool invocation TRIGGER · USER /slash typed in prompt HOOKS RUNS a shell command WHO DECIDES harness / settings.json GOOD FOR automation, guardrails, side-effects after events EXAMPLE run ruff after Edit block writes to .env deterministic · non-LLM SKILLS RUNS a markdown instruction pack WHO DECIDES the MODEL, at runtime GOOD FOR domain procedures the model should follow EXAMPLE frontend-design systematic-debugging discretionary · LLM-aware SLASH COMMANDS RUNS a named prompt WHO DECIDES the USER, by typing / GOOD FOR reusable workflows you want on demand EXAMPLE /review · /commit /security-review explicit · user-intent

Same outcome, different entry point. If the harness should react to something, use a hook. If the model should follow a recipe when it recognizes a situation, use a skill. If the user should invoke a named workflow, use a slash command. Picking the wrong mechanism is the most frequent Domain 3 exam mistake.

Prerequisites
  • Tutorial 05 complete
  • Claude Code CLI + jq
  • A scratch directory
Learning objectives
  • Create one hook, one skill, and one slash command
  • Observe each being triggered by its correct initiator
  • See a hook fire even when the model didn't "want" it to
  • Try to get a skill to fire deterministically (it can't)
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

bash
mkdir ~/primitives && cd ~/primitives
mkdir -p .claude/skills/security-review .claude/commands

Hook — .claude/settings.json:

json · .claude/settings.json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "echo \"[hook] file touched at $(date -Is)\" >> /tmp/hook.log"
      }]
    }]
  }
}

Skill — .claude/skills/security-review/SKILL.md:

markdown · SKILL.md
---
name: security-review
description: Use when the user asks to review code for security issues, audit for vulnerabilities, or check for injection/XSS/secrets.
---

# Security Review

When invoked, scan the target file(s) for:
1. Hardcoded secrets (API keys, passwords, tokens)
2. SQL injection sinks (unparameterised queries)
3. XSS risks (unescaped user input in HTML)
4. Command injection (shell execution with user input)

Report findings as a numbered list with file:line references.

Slash command — .claude/commands/log-summary.md:

markdown · log-summary.md
Summarise the last hour of entries in /tmp/hook.log and print a table of which files were touched most frequently.

Code trace — three primitives, three initiators

≈ 4 min read

Just like the MCP tutorial, there is no single runtime flow. .claude/settings.json, SKILL.md, and the slash-command file each wire a different initiator into Claude Code. Same harness, three different trigger surfaces.

What happens at session start

  • Claude Code reads .claude/settings.json and registers the PostToolUse hook — a declaration of "run this shell command whenever a tool matching Edit|Write finishes". Nothing runs yet.
  • It scans .claude/skills/*/SKILL.md and loads the frontmatter (name + description) into the skill catalogue the model will see as routing signals each turn. The body of SKILL.md stays on disk — only loaded when the skill is invoked.
  • It indexes .claude/commands/*.md as slash-command templates. The command body is a prompt stub, not executable code.
  1. HOOK — the harness initiates.

    You ask "Create hello.txt containing 'hi'". Turn flow: model emits a Write tool_use block → harness executes the write → before returning control to the model, Claude Code iterates its PostToolUse hooks, finds the one whose matcher regex matches Write, and runs its shell command. /tmp/hook.log gets a new line appended. Then the tool_result is returned to the model.

    Key point: the model cannot skip the hook. It didn't even know it ran. That's why the exam says "deterministic → hook". Hooks are enforcement, not advice.
  2. SKILL — the model initiates.

    You ask "Review hello.txt for security issues". At the top of that turn, Claude's planner sees every skill's frontmatter description as part of its routing prompt. The matcher in this case is the description"Use when the user asks to review code for security issues…" — which matches the user's phrasing. The model chooses to invoke the skill by treating its body as a mini-instruction prompt. The skill's four-step checklist becomes the model's plan for this turn.

    Key point: every step between "user said something" and "skill fires" involves a model decision. Change the wording of the user's request and the skill may not fire — that's the probabilistic nature the build-and-break exercise exposes.
  3. SLASH COMMAND — the user initiates.

    You type /log-summary. The CLI intercepts the slash before the text reaches the model, looks up .claude/commands/log-summary.md, takes its body verbatim ("Summarise the last hour…") and submits it as a user message. The model then proceeds normally — it will likely call Read on /tmp/hook.log, parse it, and produce the table. No model decision on whether to fire — you already decided by typing the slash.

    Key point: the model's subsequent behaviour is non-deterministic, but the firing was deterministic. Slash commands move the "when" decision to the user while leaving the "how" to the model.
One-line mental model

Three automation mechanisms differ on one axis: who initiates. Hook = harness (deterministic). Skill = model (probabilistic match on description). Slash = user (deterministic on firing, probabilistic on execution). "What should this do?" is a second-order question — pick the initiator first.

Walkthrough

  1. Trigger the HOOK (harness-initiated).

    Launch claude. Ask: Create hello.txt containing "hi". Then:

    bash
    cat /tmp/hook.log

    Expected: a [hook] file touched at … line.

    Why it matters: The hook fired because the harness saw a Write tool call. Claude couldn't skip it — it's deterministic.
  2. Trigger the SKILL (model-initiated).

    Ask: Review hello.txt for security issues. Expected: Claude recognises the skill's description, invokes it, and applies its checklist.

    Why it matters: The description is the routing signal — the model chose to use the skill because the task matched.
  3. Trigger the SLASH COMMAND (user-initiated).

    Type /log-summary. Expected: Claude reads /tmp/hook.log and prints a table.

    Why it matters: Slash commands are shortcuts you decide to run — no AI routing involved.

Build-and-break exercise

FORCE DETERMINISM WHERE IT DOESN'T BELONG

Try to make a skill fire on every edit

Edit .claude/skills/security-review/SKILL.md — change only the description: line in the frontmatter, keep the body untouched. Full file with the change highlighted — save over your existing SKILL.md:

markdown · .claude/skills/security-review/SKILL.md · build-and-break variant
---
name: security-review
description: Use after every edit, no exceptions    # ← was the security-review sentence
---

# Security Review

When invoked, scan the target file(s) for:
1. Hardcoded secrets (API keys, passwords, tokens)
2. SQL injection sinks (unparameterised queries)
3. XSS risks (unescaped user input in HTML)
4. Command injection (shell execution with user input)

Report findings as a numbered list with file:line references.

Now create five files in a row and count the skill invocations in the transcript.

Expected failure: the model may invoke it 0, 1, or 5 times — not deterministic. If you need "always runs", you need a hook, not a skill.

Confirm: inspect /tmp/hook.log (5 entries) vs the number of skill invocations in the transcript (variable). This is the answer to Quiz Q3: deterministic → hook.

Revert: remove the "use after every edit" language from the skill description.

Verification checklist

  • I've watched one hook, one skill, and one slash command fire.
  • I can name the initiator for each of the three.
  • I've confirmed a skill is non-deterministic by running it multiple times.
  • I know which one to pick for "run gofmt after every edit" (hook).
  • I know which one to pick for "review PR when relevant" (skill).

Cleanup

bash
cd ~ && rm -rf ~/primitives /tmp/hook.log

Further exploration

  • Write a PreToolUse hook that blocks a tool call based on a jq filter on the input.
  • Chain a skill and a hook: skill suggests an edit; hook lints after.
  • Compare .claude/skills/*/SKILL.md to .claude/agents/*.md (sub-agent definitions — another Claude Code primitive).
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. This tutorial doesn't call the Anthropic API directly from Python — Claude Code makes the requests on your behalf — so the Console is your only live signal for spend. Note the delta and add it to the running tally.

07/08
D4 · Prompt Eng. 50 min Python · Anthropic SDK · jsonschema

Structured Output with Tool-Use + JSON Schema Retry Loop

Concept covered. The reliable way to get JSON from Claude is to define a tool whose input_schema is your target JSON shape. When validation fails, return the validator's error to the model and retry — the self-healing loop from the source. Cap retries and fail loudly.

Source: Domain 4 · JSON Schema Retry mental model · Quiz Q4 and Q14

Concept diagram schema · tool-use · validate · retry
Structured output: tool-use forces JSON, validator gates the output, invalid responses feed the error back and retry. Input prompt plus a JSON Schema are passed to the LLM as a tool definition. The LLM returns a tool_use block with JSON input. A validator checks it against the schema. Valid output is returned. Invalid output's error is appended and the loop retries up to MAX_RETRIES. MAX_RETRIES · loop bound Input prompt + JSON Schema passed as a tool LLM call messages.create tools=[schema] tool_choice = forced tool_use block.input raw JSON from the model Validate jsonschema.validate() types · required enums · nested shape valid → return invalid · error "field X: expected int, got str" appended to messages, retry RULE · exit after N rounds If the model can't satisfy the schema within MAX_RETRIES, raise — don't loop forever.

Tool-use + JSON Schema is the durable way to force structured output. The validator is not optional — it's the contract. When it fails, feed the exact error string back to the model so the next turn can correct, not guess. Cap retries; structured output without a loop bound is a spend bomb.

Prerequisites
  • Tutorial 01 complete
  • Python, anthropic SDK, and jsonschema
  • Anthropic API key
Learning objectives
  • Get guaranteed-shape JSON via tool-use (not prose)
  • Validate against a jsonschema spec
  • On failure, feed the validator's error back and retry
  • Compare with the "please return JSON" prose approach
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

bash
mkdir retry-loop && cd retry-loop
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic jsonschema

Create extract.py:

python · extract.py
import anthropic, json
from jsonschema import Draft202012Validator
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
MAX_RETRIES = 3

schema = {
    "type": "object",
    "properties": {
        "name":   {"type": "string", "minLength": 1},
        "email":  {"type": "string", "format": "email"},
        "age":    {"type": "integer", "minimum": 0, "maximum": 130},
        "topics": {"type": "array", "items": {"type": "string"}, "minItems": 1},
    },
    "required": ["name", "email", "age", "topics"],
    "additionalProperties": False,
}
validator = Draft202012Validator(schema)

tools = [{
    "name": "record_contact",
    "description": "Record an extracted contact. All fields required.",
    "input_schema": schema,
}]

document = """
Hi, I'm Nadia Okafor. Email: nadia@example.com.
I'm 34 and interested in MCP, agent patterns, and prompt caching.
"""

def extract_with_retry(doc: str) -> dict:
    messages = [{"role": "user",
                 "content": f"Extract the contact from:\n<doc>{doc}</doc>"}]
    for attempt in range(MAX_RETRIES):
        resp = client.messages.create(
            model=MODEL, max_tokens=512, tools=tools, messages=messages,
            tool_choice={"type": "tool", "name": "record_contact"},
        )
        tool_block = next(b for b in resp.content if b.type == "tool_use")
        errors = list(validator.iter_errors(tool_block.input))
        if not errors:
            print(f"✓ valid on attempt {attempt+1}")
            return tool_block.input
        print(f"✗ attempt {attempt+1} failed: {errors[0].message}")
        messages.append({"role": "assistant", "content": resp.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_block.id,
                "content": f"Validation failed: {errors[0].message}. "
                           f"Path: {list(errors[0].path)}. Please retry.",
                "is_error": True,
            }],
        })
    raise RuntimeError("max retries exceeded")

print(json.dumps(extract_with_retry(document), indent=2))

Code trace — the self-healing retry loop

≈ 4 min read

This loop turns a schema validator into a training signal. Whenever jsonschema rejects the model's output, the failing message becomes the next turn's input — the model reads its own mistake and corrects it. Three design choices make that work.

Three design choices before the loop

  • Tool-use, not prose. The tool catalogue has exactly one tool (record_contact) whose input_schema is the target JSON shape. The API itself ensures tool_use.input is shape-compatible at the type level — you don't need to parse JSON out of prose.
  • Forced tool use. tool_choice={"type": "tool", "name": "record_contact"} removes the model's option to reply with prose. Every response must contain a tool_use block.
  • Schema-level validation on top. The API enforces types and required fields; jsonschema enforces everything else — format: email, minLength, minimum/maximum, additionalProperties: false. That gap is what retries close.
  1. Attempt 1 — the happy path.

    messages starts with a single user turn: the extraction prompt. create() returns a response whose content contains a ToolUseBlock(name='record_contact', input={...}). next(b for b in resp.content if b.type == "tool_use") pulls it out. validator.iter_errors(tool_block.input) returns an empty list — success. The loop returns tool_block.input on attempt 1 and never runs again.

    Why it works: the input schema is the target shape. The API and model cooperate to produce JSON matching it; the validator is just a belt-and-braces check.
  2. Attempt 1 — the sad path. Error becomes a message.

    Suppose the source text said "I'm about thirty-something". The model guesses age=35 — type-valid but semantically unsupported — or it may emit a string like "thirty-something". validator.iter_errors returns an ValidationError with a human-readable .message and a JSON-pointer .path. The loop does three things in sequence:

    • Appends the assistant turn (resp.content as-is) — so the next create() call sees what the model just said.
    • Appends a user turn containing a tool_result with is_error: True and the validator's error text.
    • Loops back to the top of for attempt in range(MAX_RETRIES) for another create().
    Why is_error: True matters: it tells the model "this wasn't a normal tool result — fix the thing that failed." Without that flag, the model tends to treat the message as data and carry on.
  3. Attempt 2 — the model reads its own error.

    The next create() sees the full message history: user prompt → assistant tool_use (wrong) → user tool_result (is_error, validator message) → assistant ???. The model has enough context to recognise that its previous input violated a specific constraint at a specific path, and emits a new tool_use with the fix. Validator re-runs — usually passes on attempt 2.

    Generic "try again" wastes this turn. Feed a pointer to what failed — "Validation failed: 'age' is a required property. Path: []." — or the model has no signal and guesses a different wrong thing.
  4. Attempt 3 — the hard cap.

    If attempt 3 still fails, for exits normally (no break needed — return is the exit) and the trailing raise RuntimeError("max retries exceeded") fires. That's the seatbelt: you never silently return bad data, and you never spin forever.

One-line mental model

Structured output is tool_use + schema + retry. The tool forces shape at the type level, the schema closes the semantic gap, the retry loop turns validation failures into corrections. Anything prose-based — "please return JSON" — skips all three.

Walkthrough

  1. Run a clean extraction.
    bash
    python extract.py

    Expected: ✓ valid on attempt 1 and a well-formed JSON dump.

    Why it matters: The API itself enforces that tool_use.input is shape-compatible with input_schema — retries for a clean schema are rare.
  2. Force a retry.

    Ambiguate the age: change I'm 34 and interested in... to I'm about thirty-something.... Run again.

    Expected: first attempt fails (age cannot be coerced to integer), second or third succeeds — the model reads the error and fixes.

    Why it matters: This is the self-healing loop — the model uses the validator's error as a signal, not a termination.
  3. Inspect the retry message.

    Before the retry, messages[-1]["content"][0]["content"] is the validator error you fed back.

    Why it matters: The model needs a concrete pointer to what failed — generic "try again" wastes a turn.

Build-and-break exercise

PROSE "RETURN JSON" TRAP

Replace tool-use with a prose request

python
resp = client.messages.create(
    model=MODEL, max_tokens=512,
    messages=[{"role": "user",
               "content": f"Extract this as JSON matching {schema}:\n{document}"}],
)
raw = resp.content[0].text
# Try to parse:
data = json.loads(raw)   # this often fails

Run 5–10 times. Expected failure modes:

  • Claude wraps JSON in ```json fences → json.loads throws.
  • Claude adds a sentence before the JSON → parse error.
  • Claude invents a field or drops one.
  • You cannot set tool_choice to force the shape.

This is why Domain 4 insists tool-use is the reliable path, and why Quiz Q14 calls out prose "return as JSON" as most unreliable.

Revert: restore the tool-use version.

Verification checklist

  • I've seen valid JSON returned via tool_use.input.
  • I've forced a retry and watched the loop self-heal.
  • I've observed the prose-JSON approach fail in a specific way.
  • My retry loop has a cap and raises on exhaustion.
  • I feed the validator's error, not "please try again", back to the model.

Cleanup

bash
deactivate && cd .. && rm -rf retry-loop

Further exploration

  • Add strict: true to a more complex schema with oneOf and see how retry-count changes.
  • Compare total tokens: tool-use vs prose+parse-then-repair.
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. The difference from your baseline is what this tutorial actually cost on Sonnet 4.6 — add it to a running tally so you can compare against the ~$2–4 total budget for the whole workbook.

Optional in-code tally. Drop this line after each client.messages.create(…) call to see per-request token counts in your terminal:

python
print(f"[usage] in={resp.usage.input_tokens} out={resp.usage.output_tokens}")
08/08
D5 · Context 45 min Python · Anthropic SDK

Prompt Caching & Fighting "Lost in the Middle"

Concept covered. Two independent techniques for long-context reliability. Prompt caching marks stable prefix tokens with cache_control so repeated calls cost ~10% of uncached. Lost in the middle — attention is strongest at start and end; critical rules belong at both anchors, never buried mid-context.

Source: Domain 5 · Attention Curve · Quiz Q5 and Q10

Concept diagram cache layout + attention curve
Prompt caching splits the context into a reusable prefix and a dynamic suffix; attention dips through the middle of long contexts. Left panel: a long cached prefix (system, tools, history) charged at roughly ten percent is followed by a small new-tokens suffix charged at full price. A cache_control breakpoint marks the boundary; TTL is about five minutes. Right panel: attention is high at the start and end of the context and dips in the middle; put rules and schemas at start or end. Prompt caching reuse a prefix · pay ~10% on hits CACHED PREFIX system · tools · long context NEW user turn cache_control ▼ ~10% cost on cache hit full cost system = [{ "text": long_docs, "cache_control": {"type": "ephemeral"} }] TTL ~5 min since last hit · expires silently GOTCHAS • one token change in prefix → cache miss • order must be static → dynamic Lost in the middle attention dips in long contexts attn pos START MIDDLE END TACTICS • put rules & schemas at start OR end • repeat critical instructions at both • don't bury key info in the middle

Caching pays you for discipline: freeze the static prefix, put the variable bits last, mark the boundary with cache_control. Separately — the model's attention is not uniform across the window. The middle is the memory black hole; put your instructions where attention lives.

Prerequisites
  • Tutorial 01 complete
  • Python, anthropic SDK, API key
  • A long document (Moby Dick chapter 1 from Project Gutenberg works; any 3–5K-token text)
Learning objectives
  • Mark a stable prefix with cache_control and observe token-cost reduction
  • Read usage.cache_read_input_tokens vs cache_creation_input_tokens
  • Plant a test fact in the middle vs end of a long context and see recall differ
  • Observe the performance cliff at the 5-minute cache TTL
Before you start · API usage

Record your spend baseline

Open console.anthropic.com/usage, filter to today, and note the current dollar amount (or screenshot the chart). You'll check the delta at the end of this tutorial so you know exactly what this one exercise cost.

Setup

bash
mkdir cache-demo && cd cache-demo
python3 -m venv .venv && source .venv/bin/activate
pip install anthropic requests

Fetch a long document with prep.py:

python · prep.py
import requests, pathlib
text = requests.get(
    "https://www.gutenberg.org/files/2701/2701-0.txt",
    timeout=30,
).text
# Keep chapters 1–5 only — enough to cross the cache threshold
body = text.split("CHAPTER 1.")[1].split("CHAPTER 6.")[0]
pathlib.Path("moby.txt").write_text(body)
print(f"saved {len(body)} chars")
bash
python prep.py

Walkthrough — Part A · Prompt Caching

Create cache.py:

python · cache.py
import anthropic, pathlib, time
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
doc = pathlib.Path("moby.txt").read_text()

def ask(question, use_cache):
    system = [
        {"type": "text", "text": "You are a literary analyst."},
        {"type": "text", "text": f"<doc>{doc}</doc>",
         **({"cache_control": {"type": "ephemeral"}} if use_cache else {})},
    ]
    t = time.time()
    resp = client.messages.create(
        model=MODEL, max_tokens=256, system=system,
        messages=[{"role": "user", "content": question}],
    )
    dt = time.time() - t
    u = resp.usage
    print(f"[{'cached' if use_cache else 'plain '}] "
          f"in={u.input_tokens} cache_read={getattr(u,'cache_read_input_tokens',0)} "
          f"cache_write={getattr(u,'cache_creation_input_tokens',0)} "
          f"out={u.output_tokens} · {dt:.2f}s")

print("--- first call (cache miss) ---")
ask("What does Ishmael do when he feels spleen?", use_cache=True)
print("--- second call (cache hit expected) ---")
ask("Name one ship mentioned.", use_cache=True)
print("--- control: no cache ---")
ask("What does Ishmael do when he feels spleen?", use_cache=False)

Code trace — three calls, three cache states

The system prompt is split into two content blocks: a short role statement and the full Moby Dick chapters. The cache_control: {"type": "ephemeral"} marker on the document block tells the API "everything up to and including this block is a stable prefix — cache it under a key derived from the token contents." Three calls exercise three different cache states.

Call 1 — use_cache=True, cache miss (cache write). First call with this prefix. The API computes the hash of the system tokens, finds no entry, processes the full input, and writes the prefix to an ephemeral cache with a 5-minute TTL. Usage reports cache_creation_input_tokens ≈ 5000 (the doc being indexed), cache_read_input_tokens = 0, and a small input_tokens for the question itself. You pay a ~25% premium over normal input for the write.

Call 2 — use_cache=True, cache hit (cache read). Different question, same system prefix. Hash matches. The API serves ~5000 tokens from cache and only re-processes the 10-or-so tokens of the new question. Usage flips: cache_creation_input_tokens = 0, cache_read_input_tokens ≈ 5000. Those cached tokens are billed at ~10% of normal input — that's the whole reason to cache.

Call 3 — use_cache=False, control. Same prefix, but the cache_control marker is absent. The API treats every token as fresh input. Both cache counters are zero; input_tokens ≈ 5000 at the full rate. This is your baseline for measuring the savings.

What actually drives the hit. The cache key is a hash of the exact prefix tokens. Any change — a punctuation edit, a blank line, a different role description — produces a different key and a cache miss. That's why the 5-minute TTL step (time.sleep(310)) also forces a rewrite: the prefix is the same, but the entry has expired, so it's effectively a new key.

  1. Run the caching comparison.
    bash
    python cache.py

    Expected lines:

    • Call 1 — cache_write≈5000, cache_read=0 — cache creation
    • Call 2 — cache_write=0, cache_read≈5000 — cache hit; input_tokens drops
    • Call 3 — cache_write=0, cache_read=0 — no caching
    Why it matters: Cache-read tokens cost ~10% of regular input tokens. On a chatbot with a fixed 40K-token system prompt, that's nearly the whole prompt billed at a tenth the rate.
  2. Test the 5-minute TTL.

    At the bottom of cache.py, insert one time.sleep(310) between the first and second calls. Full file with the change highlighted — save over your existing cache.py:

    python · cache.py · TTL variant
    import anthropic, pathlib, time
    client = anthropic.Anthropic()
    MODEL = "claude-sonnet-4-6"
    doc = pathlib.Path("moby.txt").read_text()
    
    def ask(question, use_cache):
        system = [
            {"type": "text", "text": "You are a literary analyst."},
            {"type": "text", "text": f"<doc>{doc}</doc>",
             **({"cache_control": {"type": "ephemeral"}} if use_cache else {})},
        ]
        t = time.time()
        resp = client.messages.create(
            model=MODEL, max_tokens=256, system=system,
            messages=[{"role": "user", "content": question}],
        )
        dt = time.time() - t
        u = resp.usage
        print(f"[{'cached' if use_cache else 'plain '}] "
              f"in={u.input_tokens} cache_read={getattr(u,'cache_read_input_tokens',0)} "
              f"cache_write={getattr(u,'cache_creation_input_tokens',0)} "
              f"out={u.output_tokens} · {dt:.2f}s")
    
    print("--- first call (cache miss) ---")
    ask("What does Ishmael do when he feels spleen?", use_cache=True)
    time.sleep(310)   # ← add this — forces the 5-minute ephemeral TTL to expire
    print("--- second call (cache hit expected) ---")
    ask("Name one ship mentioned.", use_cache=True)
    print("--- control: no cache ---")
    ask("What does Ishmael do when he feels spleen?", use_cache=False)

    Expected: call 2 now shows cache_write≈5000 again — the cache expired.

    Why it matters: Ephemeral caches live 5 minutes (standard) or 1 hour (premium). Plan your wake-up intervals.

Walkthrough — Part B · Lost in the Middle

Create middle.py:

python · middle.py
import anthropic, pathlib
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
doc = pathlib.Path("moby.txt").read_text()

SECRET_FACT = "The secret passphrase is SPERMACETI-7."

def ask_with_placement(placement):
    if placement == "middle":
        mid = len(doc) // 2
        body = doc[:mid] + f"\n\n{SECRET_FACT}\n\n" + doc[mid:]
    elif placement == "end":
        body = doc + f"\n\n{SECRET_FACT}\n"
    else:  # start
        body = f"{SECRET_FACT}\n\n" + doc

    resp = client.messages.create(
        model=MODEL, max_tokens=100,
        system=f"You have been given a document. A secret passphrase is hidden "
               f"in it. Quote the full passphrase exactly.\n\n<doc>{body}</doc>",
        messages=[{"role": "user", "content": "What is the secret passphrase?"}],
    )
    answer = resp.content[0].text
    hit = "SPERMACETI-7" in answer
    print(f"[{placement:6s}] {'✓' if hit else '✗'} → {answer[:80]}")

for placement in ["start", "middle", "end"]:
    for trial in range(3):
        ask_with_placement(placement)

Code trace — nine trials across three placements

Same document, same question, same model — the only variable is where the target fact lives inside the context. The outer for placement loop crossed with the inner for trial loop fires 9 create() calls total, three per placement, so you can see the recall rate as a ratio rather than a single data point.

What each placement produces in the system prompt.

  • start: f"{SECRET_FACT}\n\n" + doc — the target sentence is the first thing the model sees after the instruction. Attention curve is strong here.
  • middle: doc[:mid] + f"\n\n{SECRET_FACT}\n\n" + doc[mid:] — the fact is buried roughly halfway through ~5000 tokens of narrative. Attention curve is weakest here.
  • end: doc + f"\n\n{SECRET_FACT}\n" — the target is the last thing the model reads before the user's question. Recency makes this position almost as strong as start.

What hit is measuring. "SPERMACETI-7" in answer is a substring check against the model's reply. A ✓ means the model recovered the exact passphrase; a ✗ means it either hedged, guessed a plausible-but-wrong phrase from Moby-Dick, or refused.

Expected shape of the output. start rows: three ✓. end rows: three ✓. middle rows: some mix — often two ✓ and one ✗, sometimes worse with bigger documents. That's the U-shaped attention curve from the Domain 5 source, now a number on your terminal rather than a diagram in a slide.

Why three trials. The model is probabilistic — a single run can coincidentally succeed or fail on any placement. Three trials lets you distinguish "always fails" from "fails sometimes", which is the whole point: lost-in-the-middle is an unreliability pattern, not a hard failure. If middle recall were always-fail, the workaround would be obvious; because it's sometimes-fail, teams ship the bug.

  1. Run the placement test.
    bash
    python middle.py

    Expected: start and end recall the passphrase reliably. middle fails some fraction of the time — especially with larger documents.

    Why it matters: This is the "lost in the middle" curve from the source, made observable in your own terminal.

Build-and-break exercise

CRITICAL RULE IN THE WRONG PLACE

Bury a critical instruction in the middle

Combine both failure modes. Put a critical instruction — "Before answering, always capitalise the first word" — in the middle of the doc, and do NOT repeat it elsewhere. Ask a question that requires following it.

Expected: intermittent compliance. The "Do" from the source says: repeat critical rules at start AND end. Confirm by adding the rule to both anchors — compliance jumps to ~100%.

Verification checklist

  • I've seen cache_read_input_tokens be non-zero on a repeat call.
  • I know the default ephemeral cache TTL (5 minutes).
  • I've measured middle-placement recall failing vs start/end.
  • I can explain why system-prompt rules at position 0 aren't always enough.
  • I've used cache_control on a stable prefix.

Cleanup

bash
deactivate && cd .. && rm -rf cache-demo

Further exploration

  • Add a third cache breakpoint and measure hit rate across different query patterns.
  • Try the same middle-vs-end test with Haiku 4.5 and compare attention curves.
  • Re-read Quiz Q10 and Q15 and reason through which primitive each maps to.
After you finish · API usage

Check your spend delta

Refresh console.anthropic.com/usage. The difference from your baseline is what this tutorial cost — the cache-hit run should be dramatically cheaper than the cold-cache run, which is exactly the observation this exercise is built around.

In-code tally with cache fields. Caching splits input_tokens across three counters — print all four to see what the cache actually paid for:

python
u = resp.usage
print(f"[usage] in={u.input_tokens} cache_read={u.cache_read_input_tokens} cache_create={u.cache_creation_input_tokens} out={u.output_tokens}")
02 / Glossary

Thirty-three terms you'll meet in the tutorials.

Short, exam-relevant definitions. Filter by term, domain (D1D5), or any phrase in the definition. Use this as a reference while you work through the workbook.

33 / 33 terms
Agent
D1 · Agentic
A Claude instance wrapped in a loop that calls tools, observes results, and decides the next action. See Tutorial 01.
Agentic Loop
D1 · Agentic
The think → act → observe cycle. Terminates on stop_reason == end_turn, an iteration cap, or an error. Every loop needs a termination condition.
anthropic (Python SDK)
Official Anthropic Python client library. Exposes client.messages.create() and tool-use wiring. Version ≥ 0.40 is assumed throughout this workbook.
Cache breakpoint
D5 · Context
Marker (cache_control: {type: "ephemeral"}) placed on a content block to tell the API where a cacheable prefix ends.
Cache read
D5 · Context
Tokens served from a cached prefix on a subsequent request. Charged at ~10% of standard input cost. The metric that makes caching worth it.
Cache write
D5 · Context
First-time creation of a cache entry. Charged at a ~25% premium over standard input, amortised across later cache reads.
Claude Code
D3 · Claude Code
Anthropic's official CLI agent. Reads CLAUDE.md, hosts MCP servers, and runs hooks, skills, and slash commands. See Tutorial 05.
CLAUDE.md
D3 · Claude Code
Plain-text instruction file read by Claude Code at session start. Three tiers: global (~/.claude), project (repo root), and subdirectory — the closest tier wins.
Compaction
D5 · Context
Automatic summarisation of earlier conversation turns to free context budget. Mechanics are not yet deterministic — treat as advisory, not guaranteed.
Context window
D5 · Context
Total token budget the model can attend to in one call. Claude Sonnet 4.6 default is 200K tokens; 1M-token context is opt-in for certain workloads.
end_turn
D1 · Agentic
Value of stop_reason that means the model has decided the turn is complete and is not requesting a tool. The healthy exit condition of an agentic loop.
Ephemeral cache
D5 · Context
Default prompt-cache type. 5-minute TTL, refreshed on every read. Use when back-to-back calls re-use the same prefix.
Hook
D3 · Claude Code
A deterministic shell command Claude Code runs at a lifecycle event (PreToolUse, PostToolUse, Stop, etc.). Use when you need guaranteed execution, not model discretion. See Tutorial 06.
Hub-and-Spoke
D1 · Agentic
Orchestration pattern. A parent agent delegates to sub-agents in isolated contexts; sub-agents return only summaries, not their full transcripts. See Tutorial 02.
JSON Schema
D4 · Prompt Eng.
Contract describing the expected shape of a tool's input (or a structured output). Paired with tool-use to coerce valid JSON and drive retries on validation failure. See Tutorial 07.
Lost in the middle
D5 · Context
Empirical failure mode: facts placed in the middle of a long context are recalled less reliably than facts near the start or end. Structure long prompts accordingly. See Tutorial 08.
MCP (Model Context Protocol)
D2 · Tools/MCP
Open standard for exposing tools, resources, and prompts to a model via a standardised transport. See Tutorial 03.
MCP primitives
D2 · Tools/MCP
Three kinds a server can expose: tool (model-initiated action), resource (application-attached read-only data), prompt (user-invoked template). Picking the wrong primitive is the most common MCP mistake.
Memory tool beta
D5 · Context
Anthropic's managed persistent memory. Currently beta — exact API surface may still shift, so no dedicated tutorial yet.
Premium cache
D5 · Context
Opt-in cache type with a 1-hour TTL. Higher write cost; lower read cost amortised over longer reuse windows. Best for stable system prompts and tool catalogues.
Prompt caching
D5 · Context
Server-side reuse of previously-sent input tokens across calls. Reduces input cost ~90% and cuts time-to-first-token. See Tutorial 08.
Skill
D3 · Claude Code
Natural-language capability pack loaded into Claude Code. Non-deterministic by design — if you need guaranteed execution, use a hook instead. See Tutorial 06.
Slash command
D3 · Claude Code
User-invoked shortcut in Claude Code (e.g. /fix, /commit). Expands to a prompt template. Deterministic in when it runs — not in what the model does with it.
Stdio transport
D2 · Tools/MCP
MCP transport over a subprocess's stdin/stdout. Local-process only, no auth needed. The safe default used in Tutorial 03.
stop_reason
D1 · Agentic
Field on every Messages response telling you why the model stopped. Values: end_turn, tool_use, max_tokens, stop_sequence, pause_turn.
Structured output
D4 · Prompt Eng.
Forcing the model to return JSON that conforms to a schema. In 2026 this is implemented via tool-use with a single mandatory tool, not prose "please return JSON" prompts.
Sub-agent
D1 · Agentic
Child agent spawned by an orchestrator, running in an isolated context. Returns a summary, not its full transcript. Core unit of hub-and-spoke.
System prompt
D4 · Prompt Eng.
Instruction block attached at the top of every call, outside the user/assistant turn structure. Stable across calls → an ideal cache-write target.
Termination condition
D1 · Agentic
Any bound that causes the agentic loop to exit: stop_reason == end_turn, an iteration cap, a wall-clock timeout, a spend limit, or an unhandled error. Tutorial 01 makes you remove the cap on purpose.
Tool description
D2 · Tools/MCP
Free-text field the model reads to decide when to call a tool. Small wording changes materially affect routing — vague ("gets data.") is actively harmful. See Tutorial 04.
tool_result
D1 · Agentic
Content block type sent back to Claude with the output of a called tool. Matched to the originating tool_use_id. The "observe" step of the agentic loop.
tool_use
D1 · Agentic
Both a stop_reason value and a content-block type. Signals the model wants a tool executed; your code must run it and return a matching tool_result.
Transport
D2 · Tools/MCP
How an MCP server is reached. Three today: stdio (local subprocess), http (remote), sse (server-sent events). Remote-with-auth is under-specified — stdio is the safe default.
No terms match. Try a broader keyword or clear the filter.
03 / Troubleshooting

When things go wrong.

The errors most people hit running these tutorials, with the fix that actually works — especially on Ubuntu, where the defaults bite.

TypeError: Could not resolve authentication method. Expected either api_key or auth_token to be set

Cause. The ANTHROPIC_API_KEY environment variable is not visible to the Python process — anthropic.Anthropic() with no arguments reads the key from the environment. Typically you exported it in a different shell, or ran the script under sudo (which strips env vars).

Fix

First confirm what the shell sees. If this prints empty, export the key in the same shell where you run the script:

bash
echo $ANTHROPIC_API_KEY
export ANTHROPIC_API_KEY="sk-ant-…"
python3 loop.py

For a durable per-project setup, use a .env file:

bash + python
pip install python-dotenv
echo 'ANTHROPIC_API_KEY=sk-ant-…' > .env
# then at the top of loop.py:
from dotenv import load_dotenv; load_dotenv()

Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.

Cause. Your API key is valid (auth succeeded) but the account has no usable credit. New accounts start at $0; any free trial credit from onboarding may have expired or been consumed. Important gotcha: an API key created before the account had credit can stay stuck on this error even after you buy credits — the credit-check state attached to that key doesn't always refresh. The reliable fix is to buy credit first, then mint a new key.

Fix · do these in order
  1. Buy credit first. Open console.anthropic.com/settings/billing, add a payment method, and purchase credits. Minimum is $5 — the whole workbook budgets ~$2–4 on Sonnet 4.6, so $5 covers it several times over.
  2. Wait 1–2 minutes and confirm the balance shows on the billing page. Credits occasionally lag behind the purchase confirmation.
  3. Then create a new API key at console.anthropic.com/settings/keys. If you already created a key before buying credit, delete that one and mint a fresh one — the new key will pick up the credited balance cleanly.
  4. Export and retry in the same shell where you run the script:
bash
export ANTHROPIC_API_KEY="sk-ant-…NEW_KEY"
python3 loop.py

Two follow-ups while you're in the console:

  • Key belongs to the right workspace. If you're in an organisation, the key must be issued under a workspace that has a balance — personal keys won't draw from org credit and vice versa.
  • Enable auto-reload on the billing page. Hub-and-spoke (Tutorial 02) and the retry loop (Tutorial 07) fan out multiple calls and a mid-run balance drop will make this error re-appear partway through.

The virtual environment was not created successfully because ensurepip is not available

Cause. Ubuntu and Debian ship a minimal Python 3 by default — the stdlib venv module needs the separate python3-venv package to actually bootstrap a virtualenv.

Fix
bash · one-time
sudo apt update && sudo apt install -y python3 python3-venv python3-pip

Then re-run the venv step from the tutorial setup.

python: command not found (Ubuntu)

Cause. On Ubuntu 20.04+ the binary on PATH is python3, not python. This workbook uses python3 throughout for that reason.

Fix

Either run every command with python3 explicitly, or install the alias package so python resolves to python3:

bash
sudo apt install -y python-is-python3

error: externally-managed-environment from pip install

Cause. Ubuntu 23.04+ and current Homebrew Python enforce PEP 668 — the system Python refuses global pip install to protect OS packages. This workbook assumes you install Python packages inside each tutorial's venv, which sidesteps the problem entirely.

Fix

Activate the venv before pip install:

bash
source .venv/bin/activate
which pip  # should point inside .venv, not /usr/bin/pip
pip install "anthropic>=0.40"

For CLIs you genuinely want globally (like a linter), use pipx instead of pip.

ModuleNotFoundError: No module named 'anthropic'

Cause. You installed the package into a different Python than the one running the script. Nearly always: venv not activated, or you opened a new terminal and forgot to re-activate.

Fix

Verify which Python and which pip the shell is using — both should live inside .venv:

bash
source .venv/bin/activate
which python3 pip
python3 -c "import anthropic, sys; print(anthropic.__version__, sys.executable)"

If which points to /usr/bin/, activation didn't run — re-run source .venv/bin/activate and re-install.

NotFoundError: 404 {"error":{"type":"not_found_error","message":"model: claude-…"}}

Cause. Either the model ID has drifted (the workbook pins claude-sonnet-4-6), or your SDK is too old to know about it, or your API key's tier hasn't been enabled for that model.

Fix
bash
pip install -U "anthropic>=0.40"
# Confirm the model ID against the live catalogue:
python3 -c "import anthropic; [print(m.id) for m in anthropic.Anthropic().models.list().data]"

If Sonnet 4.6 isn't listed, fall back to claude-haiku-4-5-20251001 for the tutorials — they don't require the bigger model.

RateLimitError: 429 — Your account has hit a rate limit

Cause. New accounts start on tight tiers. Tutorials 02 (hub-and-spoke), 07 (retry loop), and 08 (caching) fan out multiple calls and can trip requests-per-minute or tokens-per-minute limits.

Fix

Lower parallelism and retry with backoff. The SDK already retries transient 429s, but you can also reduce the iteration cap in the loop or switch to Haiku 4.5 while you're testing. Check your current tier limits in the Anthropic console.

Loop in Tutorial 01 keeps running, never prints FINAL:, never exits

Cause. That's not a bug — that's the intentional build-and-break in Tutorial 01. You removed MAX_ITERATIONS and now nothing terminates the agentic loop when the model keeps requesting tools.

Fix

Ctrl-C to kill the runaway loop. Put the bound back: for turn in range(MAX_ITERATIONS):. The whole point of T01 is to watch this happen once, then never forget it.

BadRequestError: tool_result.content must be a string or list of content blocks

Cause. You returned a Python dict (or some other non-string object) as the content of a tool_result. The API accepts a string or a structured content-block list — not a raw dict.

Fix
python
tool_results.append({
    "type": "tool_result",
    "tool_use_id": block.id,
    "content": json.dumps(result),   # <- stringify here
})

claude: command not found after sudo npm i -g @anthropic-ai/claude-code

Cause. npm's global bin directory is not on your PATH. On Ubuntu with a system-packaged Node, this happens because the default prefix is /usr/local but shells installed earlier may not include it; with nvm, each version has its own bin dir that must be sourced.

Fix
bash
npm config get prefix          # find where it installed
ls $(npm config get prefix)/bin # should list 'claude'
echo 'export PATH="$(npm config get prefix)/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

MCP server in Tutorial 03 starts but Claude Code never gets a response; stdio just hangs

Cause. Almost always a silent import error or exception in the server script. Because Claude Code launches the server over stdio as a subprocess, its stderr doesn't surface in the CLI UI — the process just dies quietly and the handshake never completes.

Fix

Run the server directly before registering it with Claude Code — any traceback surfaces immediately:

bash
source .venv/bin/activate
python3 server.py  # Ctrl-C once it reports "ready" or prints a traceback

If it says "ready", the server itself is fine — check the Claude Code mcp.json config path is absolute and the working directory is correct.

Still stuck? Open an issue on the GitHub repo — the more specific the error and the command that produced it, the faster it gets added here.

04 / Utilities

Helper scripts for the workbook.

Small, self-contained scripts you can drop alongside any tutorial. One file each, no dependencies beyond the anthropic SDK the pre-start block already installs.

usage.py — API smoke test and pre-flight cost estimator

Two questions students run into early: does my key actually work? and roughly what will this prompt cost? usage.py answers both in one file. With no arguments it makes an 8-token smoke call against Sonnet 4.6 and prints the full response.usage block plus a dollar figure. With --prompt or --file it uses the Anthropic count_tokens endpoint to report per-model input tokens and estimated cost — no completion is run in that path, so pre-flight estimation is free.

Drop it in the root of your workbook directory, then python3 usage.py from any tutorial folder. The API key is read from ANTHROPIC_API_KEY exactly like every tutorial in the workbook.

python · usage.py
"""usage.py — API smoke test + pre-flight token / cost estimator.

Reads ANTHROPIC_API_KEY from the environment, same as every tutorial.

    python3 usage.py                          # 8-token smoke call
    python3 usage.py --prompt "your text"     # estimate input cost
    python3 usage.py --file prompt.txt        # estimate from a file
"""
import argparse, anthropic

# (input $ per 1M tokens, output $ per 1M tokens) — verify on the
# Anthropic pricing page before you trust the cost column.
PRICING = {
    "claude-sonnet-4-6":         ( 3.00, 15.00),
    "claude-haiku-4-5-20251001": ( 1.00,  5.00),
    "claude-opus-4-7":           (15.00, 75.00),
}

def estimate(text: str) -> None:
    """Report per-model input tokens + estimated input cost. Free."""
    client = anthropic.Anthropic()
    header = f"{'model':<30}{'input tokens':>14}{'input $':>12}{'$/1k out':>12}"
    print(header)
    print("-" * len(header))
    for model, (in_p, out_p) in PRICING.items():
        c = client.messages.count_tokens(
            model=model,
            messages=[{"role": "user", "content": text}],
        )
        in_cost = c.input_tokens / 1_000_000 * in_p
        print(f"{model:<30}{c.input_tokens:>14,}{'$' + format(in_cost, '.5f'):>12}{'$' + format(out_p/1000, '.5f'):>12}")

def smoke_test() -> None:
    """Make one tiny real call so you see live .usage fields."""
    client = anthropic.Anthropic()
    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=8,
        messages=[{"role": "user", "content": "Say exactly: OK"}],
    )
    u = resp.usage
    in_p, out_p = PRICING["claude-sonnet-4-6"]
    cost = u.input_tokens / 1_000_000 * in_p + u.output_tokens / 1_000_000 * out_p
    print(f"[ok] API key works — model={resp.model}")
    print(f"     response={resp.content[0].text!r}")
    print(f"     usage: input={u.input_tokens} output={u.output_tokens}")
    print(f"            cache_read={u.cache_read_input_tokens or 0} cache_create={u.cache_creation_input_tokens or 0}")
    print(f"     cost=${cost:.6f}")

def main() -> None:
    ap = argparse.ArgumentParser(description="Claude API smoke test + pre-flight token counter")
    g = ap.add_mutually_exclusive_group()
    g.add_argument("--prompt", help="estimate cost of a literal prompt string")
    g.add_argument("--file",   help="estimate cost of a prompt read from a file")
    args = ap.parse_args()
    if args.prompt:
        estimate(args.prompt)
    elif args.file:
        with open(args.file) as f:
            estimate(f.read())
    else:
        smoke_test()

if __name__ == "__main__":
    main()

Smoke test — confirm the key works and see real .usage fields:

bash
$ python3 usage.py
[ok] API key works — model=claude-sonnet-4-6
     response='OK'
     usage: input=12 output=3
            cache_read=0 cache_create=0
     cost=$0.000081

Pre-flight — what will this prompt cost on each model?

bash
$ python3 usage.py --prompt "Summarise this repository in one sentence."
model                           input tokens     input $    $/1k out
--------------------------------------------------------------------
claude-sonnet-4-6                         13   $0.00004    $0.01500
claude-haiku-4-5-20251001                 13   $0.00001    $0.00500
claude-opus-4-7                           13   $0.00020    $0.07500

The count_tokens path doesn't draw from your balance — it's a free-of-charge preview. The smoke test does draw (around a tenth of a cent on Sonnet 4.6), but that's the whole point: you want to see real .usage fields come back before spending six hours on tutorials. Keep the PRICING table aligned with the official Anthropic pricing page — the numbers here match the workbook's tested date but can drift.