Designing Tools Agents Can Use

Deep Dive · Tool & Capability Design

The tools are the agent's API, and you are designing for the model, not the human.

A model is only as capable as the tools you hand it, and a tool that a skilled engineer could use after reading the source is not the same as a tool a model can use blind, from its description alone, under context pressure, on the first try. This essay is the mindset shift: a tool is not a wrapped function — it is the surface through which a non-deterministic reasoner acts on the world, and every gap between what the tool affords and what the model can infer becomes a failure you will see in production. Anthropic's own guidance frames tools as a new software contract — between a deterministic system and a stochastic agent — and that framing changes every design decision below.

STEP 1

A tool is the agent's entire interface to reality.

An agent has no hands. Everything it does to the world — read a file, query a database, send a message, refund a charge — happens through a tool call, and nothing else. That makes the tool set the agent's complete action space: capabilities you did not expose do not exist for the model, and capabilities you exposed badly are ones it will use badly. The leverage here is enormous and underspent — teams pour effort into prompts and model choice while shipping tools that are thin JSON wrappers around an internal API never meant for a stochastic caller. The tool layer is usually the highest-return, least-touched surface in an agent system.

STEP 2

Design for the model's failure modes, not an engineer's competence.

A REST API assumes a caller who read the docs, holds state across calls, and debugs with a stack trace. A model assumes none of that. It sees only the tool name, description, and schema in its context; it may hallucinate arguments, misread the purpose, call the tool out of order, or invent a field that sounds plausible. You are designing for a caller who is fast, capable, and confidently wrong under ambiguity. The discipline: assume the description is the only documentation that exists, assume every optional ambiguity will be resolved in the worst way, and assume a malformed call is your design's fault, not the model's.

Read your tool's name, description, and schema in isolation — nothing else — and ask "could I call this correctly with no source access?" If you hesitate, the model will fail. This three-line read is the cheapest tool eval you have.

STEP 3

Affordance: the tool should make the right call obvious and the wrong call hard.

Borrow the term from interface design. A good door handle tells you to push or pull without a sign; a good tool tells the model how to use it without a worked example. Affordance comes from the shape of the tool: a name that states the action and the object (refund_order, not process), a description that says when to use it and when not to, an argument set narrow enough that there is essentially one correct way to fill it, and return values that report what happened in words the model can act on. Affordance is not documentation bolted on — it is the schema and naming doing the explaining structurally.

STEP 4

Build tools around tasks, not around your internal API.

The reflex is to mechanically expose existing endpoints one-to-one. That ships the model your database's shape, not the agent's job. Block rebuilt its Linear integration from thirty-plus thin endpoint mirrors down to two task-shaped tools and saw the agent get better, because the unit the model reasons in is "find the issue and update it," not "GET /issues then PATCH /issues/:id." Design the tool around the consolidated thing the agent is trying to accomplish; collapse the multi-step internal dance behind one call when the model has no business orchestrating it.

# Wrong: the database's shape, leaked to the model
db_query(table="orders", where="id=42")
db_update(table="orders", set="status='refunded'")

# Right: the task's shape, the orchestration hidden
refund_order(order_id="42", reason="defective")
# -> "Refunded $39.00 to order 42. Customer notified."

One-to-one endpoint mirroring feels efficient and is the most common tool-design mistake in 2025-era agents. It pushes orchestration the model is bad at into the loop and turns every multi-step task into a place the agent can desynchronize.

STEP 5

The tool's output is also a prompt.

Whatever a tool returns goes straight back into the model's context and conditions the next decision. A tool that returns raw rows, opaque IDs, or a bare 200 forces the model to guess what happened; a tool that returns a short, human-readable statement of outcome — what changed, what to do next, what is now true — steers the next step for free. Anthropic's tooling guidance is explicit that agents reason better over human-legible fields than over technical identifiers. Treat every return value as a sentence you are writing into the model's reasoning, because that is exactly what it is.

STEP 6

When designing for the model is the wrong default.

Two honest exceptions. A tool whose only caller is a tightly bounded deterministic harness — not the model freely choosing — can be as raw as any internal function; the model-facing discipline buys nothing there. And early, throwaway prototypes can mirror endpoints to learn what the agent actually needs before investing in task-shaped design. Designing for the model costs real effort; spend it on tools the model chooses freely and the user feels the result of — not on plumbing the model never sees.