主循环 — Agentic AI 实战指南

1.1

第一部分 / 构建 · 第 1 周

构建尽可能小的智能体（agent）。

一个模型、三个工具、一个 while 循环，无需任何框架。目标不是上线——而是亲手感受每一个基本原语，从而在之后形成自己的判断。

STEP 1

搭建项目骨架。

在写任何代码之前，我们需要三样东西：一个语料库（智能体将要阅读的文档）、一个存放智能体代码的地方，以及一个安装了 SDK 的 Python 环境。仅此而已。不需要向量数据库，不需要框架，不需要编排层。每增加一个基础设施组件，就多一层遮蔽——它会挡住你正努力学习的那些基本原语。

选择语料库

语料库（corpus）不过是一个装满文本文件的文件夹。选择你自己感兴趣的内容——当智能体出错时，你能立刻察觉。好的选择包括：PostgreSQL 文档、你自己的 Markdown 笔记，或某个开源项目的文档（React、Rust 手册、Kubernetes 概念）。目标是 50–500 个 Markdown 文件。

少于 50 个文件，智能体几乎不需要思考；超过 500 个，你会在检索（retrieval）迭代上花费太多时间。本教程以 PostgreSQL 文档为例。

目录结构

# Create the project
mkdir research-agent && cd research-agent
mkdir -p corpus agent evals scripts runs

# Files we'll fill in over the next steps
touch agent/__init__.py
touch agent/loop.py        # the agentic loop
touch agent/tools.py       # tool definitions + handlers
touch agent/prompts.py     # system prompts
touch scripts/run.py       # CLI entry point
touch .env                 # for API keys

运行 tree（或 ls -R），你会看到：

research-agent/
├── agent/
│   ├── __init__.py
│   ├── loop.py
│   ├── prompts.py
│   └── tools.py
├── corpus/         # populate with your .md files
├── evals/
├── runs/
├── scripts/
│   └── run.py
└── .env

安装依赖

两个 API 的 Python 环境完全相同——只有 SDK 包不同。

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install Anthropic SDK + utilities
pip install anthropic python-dotenv rich

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install OpenAI SDK + utilities
pip install openai python-dotenv rich

三个包，只有一个与服务商相关。python-dotenv 从 .env 读取你的 API 密钥。rich 让追踪（trace）信息更易读。

.env 文件

# .env — get this from console.anthropic.com
ANTHROPIC_API_KEY=sk-ant-api03-...your-key...

# .env — get this from platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...your-key...

立刻将 .env 添加到 .gitignore——提交到 git 的密钥会在数小时内被爬取。

Question

同一个项目里能同时使用两个 API 吗？

可以。智能体的循环、工具和语料库完全相同——只有模型调用部分需要替换。到第一阶段结束时，你会拥有两个行为相同的 loop.py 版本，这是感受两个 API 设计选择所带来的得失的最简洁方式。

安装两个 SDK，并在 .env 中放入两个密钥。否则，现在先选一个。

Question

为什么不用框架？LangChain 已经提供了这一切。

因为框架会遮蔽你正在学习的内容。LangChain 的 AgentExecutor 是一个 400 行的类，负责管理循环、工具分发、重试、记忆和解析——而这些恰恰是本教程要讲的基本原语。从框架入手，你只会学到如何配置智能体，而不是它的工作原理。

从头构建之后，框架才会成为工具，而非黑盒。

现在运行 git init && git add . && git commit -m "scaffold"。等智能体跑起来之后，你会想要一个干净的检查点来做对比。

STEP 2

定义三个工具。恰好三个。

智能体就是一个在循环中调用工具的模型。没有工具，你只有一个聊天机器人；有了工具，你才有了智能体。三个工具是让研究型智能体展现其推理过程的最少配置。

三个工具及其原因

search_docs(query) 用于查找相关文档——返回文档 ID 加片段，而非全文。片段让上下文（context）保持精简，同时迫使智能体判断哪些文档值得获取。

fetch_doc(doc_id) 读取一篇完整文档。将"查找"与"读取"分开是有意为之：这让智能体的相关性判断在追踪中清晰可见。

submit_answer(answer, citations) 以结构化输出结束循环。没有它，智能体可能只是用纯文本说出答案，而我们将无从提取引用。

编写工具模式（schema）

两个 API 都使用 JSON Schema 定义参数，但包装方式不同。Anthropic 使用 {name, description, input_schema} 包装；OpenAI Responses API 使用 {type: "function", name, description, parameters}。Schema 本身完全相同。

# agent/tools.py
TOOLS = [
    {
        "name": "search_docs",
        "description": (
            "Search the corpus by keyword. Returns up to "
            "5 matches, each with doc_id and 300-char snippet."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"],
        },
    },
    {
        "name": "fetch_doc",
        "description": "Fetch full text of a doc by doc_id.",
        "input_schema": {
            "type": "object",
            "properties": {
                "doc_id": {"type": "string"}
            },
            "required": ["doc_id"],
        },
    },
    {
        "name": "submit_answer",
        "description": "Submit final answer. Ends conversation.",
        "input_schema": {
            "type": "object",
            "properties": {
                "answer": {"type": "string"},
                "citations": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["answer", "citations"],
        },
    },
]

# agent/tools.py
TOOLS = [
    {
        "type": "function",
        "name": "search_docs",
        "description": (
            "Search the corpus by keyword. Returns up to "
            "5 matches, each with doc_id and 300-char snippet."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"],
            "additionalProperties": False,
        },
    },
    {
        "type": "function",
        "name": "fetch_doc",
        "description": "Fetch full text of a doc by doc_id.",
        "parameters": {
            "type": "object",
            "properties": {
                "doc_id": {"type": "string"}
            },
            "required": ["doc_id"],
            "additionalProperties": False,
        },
    },
    {
        "type": "function",
        "name": "submit_answer",
        "description": "Submit final answer. Ends conversation.",
        "parameters": {
            "type": "object",
            "properties": {
                "answer": {"type": "string"},
                "citations": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["answer", "citations"],
            "additionalProperties": False,
        },
    },
]

编写处理函数

处理函数是纯 Python——没有任何服务商相关代码。两个 API 共用同一套处理函数。

# agent/tools.py (continued — shared)
from pathlib import Path

CORPUS = Path("corpus")

def search_docs(query: str) -> list[dict]:
    """Dumb substring scan. Intentionally bad."""
    results = []
    q = query.lower()
    for path in CORPUS.glob("*.md"):
        text = path.read_text(encoding="utf-8")
        if q in text.lower():
            idx = text.lower().find(q)
            start = max(0, idx - 100)
            end = min(len(text), idx + 200)
            results.append({
                "doc_id": path.stem,
                "snippet": text[start:end].strip(),
            })
        if len(results) >= 5: break
    return results

def fetch_doc(doc_id: str) -> dict:
    path = CORPUS / f"{doc_id}.md"
    if not path.exists():
        return {"error": f"no doc: {doc_id}"}
    return {
        "doc_id": doc_id,
        "content": path.read_text(encoding="utf-8"),
    }

HANDLERS = {
    "search_docs": search_docs,
    "fetch_doc": fetch_doc,
    # submit_answer is handled in the loop, not here
}

单独测试工具

在接触循环之前，先验证工具能正常工作。打开 Python REPL：

>>> from agent.tools import search_docs
>>> results = search_docs("connection pool")
>>> for r in results:
...     print(r["doc_id"], "→", r["snippet"][:60])

runtime-config-connection → ...connection pool can hold up to max_connections...
runtime-config-resource → ...each connection consumes shared memory, so pool size...
pgbouncer-modes → ...connection pooling in PgBouncer operates in three distinct...
libpq-connect → ...PQconnectdb opens a single connection; for connection pool...
ddl-system-columns → ...system catalogs in the connection pool are shared...

五个真实结果，每个都带有片段。简陋但好用。

Question

两个 schema 有两处不同。additionalProperties 和顶层 "type": "function" 是怎么回事？

顶层 type： Anthropic 只有一种工具（函数调用），不需要类型判别符。OpenAI Responses 除了函数之外还支持内置工具（web_search、file_search、computer_use），因此需要用 "type": "function" 将用户定义的函数与这些内置工具区分开。

additionalProperties: false： 这启用了 OpenAI 的"严格模式"，确保模型的参数与你的 schema 完全匹配。不加此项，模型可能会自造字段。可选但推荐。Anthropic 的 API 以不同方式强制执行 schema 合规性，不需要这个标志。

Question

为什么不让智能体在一次工具调用中同时看到全部 5 个片段和完整文档？

上下文窗口（context window）是智能体必须管理的资源。如果 search_docs 返回完整文档，每次搜索都会向对话中倾倒约 2 万个令牌（token）。两次搜索后智能体就不堪重负；四次搜索后就会触及上下文限制。

"先片段后获取"迫使智能体判断哪些文档值得消耗这个成本（cost）。这个决策过程正是我们希望在追踪中看到的行为。

不要添加 list_all_docs 工具。你会有这个冲动，但这会培养错误的直觉——你希望智能体基于查询进行推理，而不是浏览加搜索。

STEP 3

编写循环。写得丑一点。

智能体的核心思想：一个 while 循环，每次迭代调用模型，执行它请求的工具，然后将结果回传。其他一切——框架、编排、智能体 SDK——都是围绕这个循环的装饰。

心智模型

┌─────────────────────────────────────────────┐ │ history = [user query] │ │ │ │ loop: │ │ response = model.call(history, tools) │ │ history.append(response) │ │ │ │ if response wants to submit: │ │ → return answer │ │ │ │ for each tool_call in response: │ │ result = run_tool(name, args) │ │ history.append(tool_result) │ │ │ │ (repeat) │ └─────────────────────────────────────────────┘

历史列表在每次迭代中都会增长。模型在每一轮都能看到完整历史——它记得自己搜索了什么、结果是什么、做了哪些决定。这就是智能体在本次对话中的"记忆"。

这个心智模型对两个 API 完全相同。它们的差异在于命名：Anthropic 将历史称为 messages，OpenAI 称之为 input。Anthropic 返回 content 块；OpenAI 返回 output 项。Anthropic 使用 tool_use/tool_result；OpenAI 使用 function_call/function_call_output。形态相同，词汇不同。

系统提示词（system prompt）

两个 API 完全相同。

# agent/prompts.py
SYSTEM_PROMPT = """You are a research assistant for a documentation corpus.

You have three tools:
- search_docs(query): find relevant documents
- fetch_doc(doc_id): read a full document
- submit_answer(answer, citations): finish

Process:
1. Search for terms related to the user's question.
2. If a snippet looks promising, fetch the full doc.
3. Repeat until you have enough to answer confidently.
4. Submit your answer with citations to doc_ids you used.

Rules:
- Every claim must be supported by a cited doc.
- If the corpus doesn't have the answer, say so honestly.
- Don't search for the same thing twice.
- Aim for 3-6 tool calls before submitting."""

循环本身

# agent/loop.py — Anthropic Messages API
from anthropic import Anthropic
from agent.tools import TOOLS, HANDLERS
from agent.prompts import SYSTEM_PROMPT

client = Anthropic()

def run_agent(user_query: str, max_steps: int = 10):
    messages = [{"role": "user", "content": user_query}]
    trace = []

    for step in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages,
        )

        # Append assistant turn — required by API
        messages.append({
            "role": "assistant",
            "content": response.content,
        })
        trace.append({"step": step, "response": response})

        if response.stop_reason == "end_turn":
            return {"status": "halted_no_answer",
                    "trace": trace}

        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue

            if block.name == "submit_answer":
                return {
                    "status": "answered",
                    "answer": block.input["answer"],
                    "citations": block.input["citations"],
                    "steps_used": step + 1,
                    "trace": trace,
                }

            try:
                result = HANDLERS[block.name](**block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })
            except Exception as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"error: {e}",
                    "is_error": True,
                })

        # All tool results in one user turn
        messages.append({"role": "user",
                         "content": tool_results})

    return {"status": "step_limit", "trace": trace}

# agent/loop.py — OpenAI Responses API
import json
from openai import OpenAI
from agent.tools import TOOLS, HANDLERS
from agent.prompts import SYSTEM_PROMPT

client = OpenAI()

def run_agent(user_query: str, max_steps: int = 10):
    # Responses API uses a flat list of "items"
    input_items = [
        {"role": "user", "content": user_query}
    ]
    trace = []

    for step in range(max_steps):
        response = client.responses.create(
            model="gpt-5.5",
            instructions=SYSTEM_PROMPT,
            tools=TOOLS,
            input=input_items,
        )

        # Append every output item to history
        for item in response.output:
            input_items.append(item.model_dump())
        trace.append({"step": step, "response": response})

        # Find function_call items in this turn
        calls = [i for i in response.output
                 if i.type == "function_call"]

        if not calls:
            # Model produced text without calling a tool
            return {"status": "halted_no_answer",
                    "trace": trace}

        for call in calls:
            args = json.loads(call.arguments)

            if call.name == "submit_answer":
                return {
                    "status": "answered",
                    "answer": args["answer"],
                    "citations": args["citations"],
                    "steps_used": step + 1,
                    "trace": trace,
                }

            try:
                result = HANDLERS[call.name](**args)
                output = str(result)
            except Exception as e:
                output = f"error: {e}"

            # Match output to call by call_id
            input_items.append({
                "type": "function_call_output",
                "call_id": call.call_id,
                "output": output,
            })

    return {"status": "step_limit", "trace": trace}

逻辑相同——形态各异

两个循环在结构上完全一致：发送历史与工具，追加响应，分发工具调用，追加结果，不断重复。但数据形态的差异值得深究。

Anthropic 使用带内容块的消息。 每一轮是一个 {role, content} 对象，其中 content 要么是字符串（用于用户输入），要么是块列表（用于工具调用、结果和文本）。API 强制要求严格的轮次交替：助手轮必须紧接在携带 tool_results 的用户轮之前出现。

OpenAI Responses 使用扁平项目列表。 input 参数是一个项目列表，每个项目都有一个 type：用户消息、function_call 项、function_call_output 项等。无需严格交替；项目之间通过 call_id 关联，而非位置顺序。

两个 API 中最常见的 Bug

anthropic.BadRequestError: Error code: 400 -
{'error': {'type': 'invalid_request_error',
'message': 'messages.1: tool_result block found without
corresponding tool_use block'}}

你的循环忘记在携带工具结果的用户轮之前追加助手轮。修正追加顺序即可。

openai.BadRequestError: Error code: 400 -
{'error': {'message': 'No tool call found for
function_call_output with call_id call_xyz...',
'type': 'invalid_request_error'}}

你在 input 中追加了 function_call_output，但没有匹配的 function_call。你忘记在添加工具结果之前先追加模型的输出项。

错误信息不同，根本错误相同：工具结果需要对应的调用也在历史中存在。

Question

Anthropic 的 stop_reason 是什么，OpenAI 的等价物是什么？

Anthropic 返回 response.stop_reason，取值如 "tool_use"（想调用工具，循环继续）或 "end_turn"（不调用工具直接结束，循环终止）。

OpenAI Responses 没有以同样方式暴露单一的 stop_reason。你需要检查 response.output——如果其中包含 function_call 项，则分发它们；如果只包含文本/消息项，则说明模型已完成。

Question

为什么 OpenAI 用 call_id，而 Anthropic 用 tool_use_id？

概念相同，名称不同。两者都是 API 生成的不透明字符串，用于将工具调用与其结果关联起来。模型在调用时返回它；你在输出时也返回它；API 据此完成匹配。

它们存在的原因：当模型在一轮中发出多个工具调用（并行工具，将在第三阶段讲解）时，API 需要知道哪个结果对应哪个调用。若没有 ID，你只能依赖位置顺序，这很脆弱。

Question

为什么是 max_steps=10？

经验值。对于小语料库上的三工具研究智能体，10 步已经很宽裕——大多数查询在 4–6 步内完成。设置上限是出于安全考虑：如果模型陷入循环，我们不希望它无休止地消耗 API 额度。

如果大多数运行都触及步数上限，问题不在于上限——而是智能体卡住了。读取追踪并修复根本问题（通常是提示词问题）。

STEP 4

运行它。逐条观察追踪。做笔记。

现在我们在真实问题上运行智能体，逐行读取它的行为。这是第一阶段中最重要的步骤。第二和第三阶段的所有内容，都是对我们在这里观察到的行为的回应。

示例 1：一次干净的运行

$ python scripts/run.py "What's the default PostgreSQL port?"

以下是你会看到的追踪——无论使用哪个 API 都完全相同：

──────────────────── Step 0 ────────────────────
┌─ thinking ─────────────────────────────────────┐
│ I'll search for "default port" to find the     │
│ PostgreSQL configuration.                      │
└────────────────────────────────────────────────┘
→ search_docs({'query': 'default port'})

──────────────────── Step 1 ────────────────────
┌─ thinking ─────────────────────────────────────┐
│ The snippet from runtime-config-connection     │
│ mentions port 5432. Let me fetch the full doc  │
│ to confirm.                                    │
└────────────────────────────────────────────────┘
→ fetch_doc({'doc_id': 'runtime-config-connection'})

──────────────────── Step 2 ────────────────────
→ submit_answer({
    'answer': 'PostgreSQL listens on TCP port
    5432 by default. This can be changed via the
    `port` parameter in postgresql.conf.',
    'citations': ['runtime-config-connection']
  })

─────────────── Final answer ───────────────
PostgreSQL listens on TCP port 5432 by default.
This can be changed via the `port` parameter in
postgresql.conf.
citations: ['runtime-config-connection']
status: answered

值得注意的地方

三步，路径清晰。 搜索 → 获取 → 提交。智能体没有对一个简单问题过度研究，没有获取多余的文档，引用了且仅引用了它实际使用的内容。

"thinking"块是真实的模型输出——Claude 和 GPT 都经常在工具调用旁边附上简短的文字来说明自己的计划。调试时很有用。

示例 2：一个多跳问题

$ python scripts/run.py "How does PgBouncer's transaction pooling
   interact with prepared statements?"

──────────────────── Step 0 ────────────────────
→ search_docs({'query': 'PgBouncer transaction pooling'})

──────────────────── Step 1 ────────────────────
┌─ thinking ─────────────────────────────────────┐
│ Got pgbouncer-modes. I need to fetch the full  │
│ doc AND search for prepared statements.        │
└────────────────────────────────────────────────┘
→ fetch_doc({'doc_id': 'pgbouncer-modes'})

──────────────────── Step 2 ────────────────────
→ search_docs({'query': 'prepared statements'})

──────────────────── Step 3 ────────────────────
→ fetch_doc({'doc_id': 'sql-prepare'})

──────────────────── Step 4 ────────────────────
┌─ thinking ─────────────────────────────────────┐
│ Prepared statements are session-scoped. In     │
│ transaction pooling, server connections rotate │
│ between transactions. They conflict.           │
└────────────────────────────────────────────────┘
→ submit_answer({
    'answer': 'Transaction pooling and prepared
    statements conflict. Prepared statements are
    scoped to a session, but transaction pooling
    rotates server connections between
    transactions...',
    'citations': ['pgbouncer-modes', 'sql-prepare']
  })

status: answered (5 steps)

值得注意的地方

双主题分解。 智能体意识到这是一个涉及两个事物的问题，并分别进行了搜索。正确的多跳行为。

顺序搜索-再获取。 每次搜索只引发一次获取决策，而非全部五个。"先片段后获取"的设计在发挥作用。

示例 3：你肯定会遇到的失败模式

现在来看一个智能体出错的问题。粗糙的子字符串搜索让第一阶段的失败模式变得格外清晰：

$ python scripts/run.py "When should I VACUUM versus VACUUM FULL?"

──────────────────── Step 0 ────────────────────
→ search_docs({'query': 'VACUUM'})

──────────────────── Step 1 ────────────────────
┌─ thinking ─────────────────────────────────────┐
│ Got 5 results but they all mention VACUUM      │
│ briefly. Let me search more specifically.      │
└────────────────────────────────────────────────┘
→ search_docs({'query': 'VACUUM FULL'})

──────────────────── Step 2 ────────────────────
→ search_docs({'query': 'when to use VACUUM'})

──────────────────── Step 3 ────────────────────
┌─ thinking ─────────────────────────────────────┐
│ The third search returned nothing — substring  │
│ "when to use" doesn't appear literally. Let    │
│ me try fetching routine-vacuuming directly.    │
└────────────────────────────────────────────────┘
→ fetch_doc({'doc_id': 'routine-vacuuming'})

──────────────────── Step 4 ────────────────────
→ submit_answer({...})

status: answered (5 steps)

一次运行中的三个问题

问题 1：冗余搜索。 由于子字符串匹配返回过宽泛的结果，搜索了三次。语义搜索本可在第一次就命中正确文档。

问题 2：短语搜索对自然语言失效。 "when to use VACUUM" 没有返回任何结果，因为没有任何文档包含这个字面短语。

问题 3：靠幸运猜测恢复。 它猜测了 routine-vacuuming，因为这是一个合理的文档名。换成另一个语料库，这招就不管用了。

这就是我们需要的数据。 三个具体理由，说明为什么要在第二阶段升级检索系统。

追踪日志

不要只是观察追踪——把它们记录下来。维护 runs/notes.md：

# Phase 1 trace log

## 2026-05-16 — first runs

### Q: "default postgres port"
- 3 steps, clean. ✓ search → fetch → submit

### Q: "PgBouncer pooling + prepared statements"
- 5 steps, correct answer
- ✓ decomposed into two sub-searches

### Q: "VACUUM vs VACUUM FULL"
- 5 steps, eventually correct
- ✗ 3 redundant searches before progress
- ✗ "when to use VACUUM" returned 0 results
- ✗ recovered by guessing doc_id, lucky
→ Phase 2 needs: semantic similarity, reranker

### Q: "How do I configure SSL?"
- 8 steps, hit token limit on fetch
- ✗ libpq-ssl doc is ~30k tokens
→ need chunking, not full-doc fetches

这份日志是第二阶段的设计压力来源。在编写检索栈之前，你会重新翻阅它。

Question

如果我同时构建了 Anthropic 和 OpenAI 两个版本，它们应该如何比较？

在简单查询上，几乎相同——两个模型都能很好地处理这类工具调用（tool use）。你可能注意到的差异：

严格模式。 OpenAI 的 additionalProperties: false 让复杂 schema 的参数解析更可靠。Anthropic 的模型往往无需此提示就能很好地遵循 schema。
思考风格。 Claude 倾向于简短的计划，GPT 倾向于更冗长的叙述。如果你想要某种风格，可以调整系统提示词。
从工具错误中恢复。 两者都能很好地恢复。我们将在第四阶段系统地测试这一点。

Question

我的智能体用粗糙的搜索也答对了。我需要第二阶段吗？

可能是你的语料库太小，或问题太简单，子字符串匹配就能覆盖。试试更难的问题：多跳问题、换了说法的术语、答案是隐含而非明说的概念性问题。粗糙搜索会在这些情况下失效。

如果还是能用，那你发现了一个有用的真相：检索复杂度应与问题复杂度相匹配。对于一个 50 篇文档的个人 Wiki，BM25 可能就够用了。对于有模糊问题的 5 万篇企业文档，你需要第二阶段的全套方案。

Question

智能体有时以 halted_no_answer 结束。怎么修？

模型在没有调用 submit_answer 的情况下就产生了文本。常见原因：

模型认为已经回答了。 工具结果逐字包含了答案；模型用自己的话转述，没有调用工具。
模型放弃了。 多次搜索无结果；模型说"找不到"。修复方法：在提示词中加上"如果答案不在语料库中，请调用 submit_answer 并说明'不在语料库中'"。

通常是提示词调整问题，不需要改代码。

永久保存追踪日志。在调整第四阶段的评估（eval）时，你会想起第一阶段哪些问题比较难——它们是绝佳的测试用例。

End of week 1

交付物

一个可以返回带引用的答案、或在无法回答时优雅失败的 CLI 工具。一份包含约 10 次运行及观察的追踪日志。你应该能够清晰说明驱动第二阶段的三个主要失败模式。

不超过 100 行 Python 的智能体循环
三个带描述的工具，不使用任何框架
用 rich 打印的格式良好的追踪信息
10 次以上运行并附有笔记
已识别并命名的三个主要失败模式