Schemas, Contracts & Defaults

Deep Dive · Tool & Capability Design

The schema is the contract, and a good schema makes the wrong call impossible to express.

A model fills your tool's arguments by pattern-matching against the schema in its context, so the schema is not validation paperwork — it is the instruction set. Every field you make optional is a decision you delegated to a stochastic caller; every loose string is a place it can put anything; every absent default is a question it must answer and may answer wrong. This essay is about designing schemas so the space of expressible calls is almost exactly the space of correct calls — making misuse structurally impossible rather than catching it after.

STEP 1

The schema teaches the model before any validator runs.

By the time a validator rejects a bad call, the damage to the loop is done: a wasted round-trip, a confused model, often a retry that fails the same way. The schema is your first and cheapest intervention because the model reads it and conforms to it before emitting the call. A field named iso_date typed as a string with a format hint and a one-line description produces correct dates far more reliably than a field named date validated by a regex the model never sees. Move correctness upstream into the shape; the validator is the backstop, not the teacher.

STEP 2

Make illegal states unrepresentable, not merely rejected.

The strongest schema design borrows from type-driven design: if a combination of arguments is invalid, the schema should make it impossible to write, not merely invalid to submit. Prefer an enum over a free string when the values are known. Use a discriminated union ({type: "email", address} vs {type: "sms", phone}) instead of a flat object with mutually-exclusive optional fields where the model can fill both or neither. Constrain numbers with min/max. Every constraint you encode is a class of model error that can no longer happen, deleted instead of detected.

# Weak: every field optional, enum as free string, no bounds
notify(channel: str = None, email: str = None,
       phone: str = None, retries: int = None)

# Strong: union makes "email with no address" inexpressible
notify(target: EmailTarget | SmsTarget,   # exactly one, fields required
       retries: int = 2)               # sane default, 0..5 bounded

STEP 3

Required vs optional is a risk decision, not a convenience one.

The instinct is to make fields optional "to be flexible." But optional means the model decides, and the model decides under ambiguity, fast, sometimes wrong. A field is only legitimately optional if there is a default that is correct in the overwhelming majority of cases and a wrong omission is cheap. Anything safety-relevant — the account being charged, the destination of a write, a confirmation flag — should be required and explicit, so the model is forced to state intent rather than fall into a default it never reasoned about.

An optional field with no default on a side-effecting tool is the worst of both worlds: the model often omits it, and the tool then either guesses or fails. If it matters, require it; if it doesn't, give it a real default. "Optional, no default" is a latent bug.

STEP 4

Defaults are decisions you are making on the model's behalf — make the safe one.

Every default is a choice that fires when the model says nothing, so it must be the choice you would want when the agent didn't think about it. Defaults bias toward the reversible, the bounded, the dry-run. A deletion tool should default to a soft delete or require an explicit hard flag; a query should default to a small page, not the whole table; anything destructive should default to dry_run=true so the unconsidered call is the safe one. The principle: the path of least specification should be the path of least harm.

STEP 5

Descriptions and examples are part of the schema, because the model reads them.

A JSON Schema is not just types — every field can carry a description, and the model uses it. The highest-leverage tool edits are usually one-line field descriptions that disambiguate ("amount in cents, not dollars"; "UTC, ISO-8601"; "must be an ID returned by search, not a name"). A 2025 ecosystem study found 97% of MCP tool descriptions had at least one quality issue and 56% had unclear purpose statements — meaning the cheapest available win for most agents today is writing schema descriptions as if they were the prompt, because they are.

Put a worked example in the field description for any argument with a non-obvious format. "e.g. order_8f3a" in the description prevents more malformed calls than any validator catches, and costs one line.

STEP 6

When a tight schema is the wrong tool.

Over-constraining has a real cost: a schema so rigid it can't express a legitimate edge case forces the model into contortions or makes the capability unreachable, and a deep nested union can be harder for the model to fill correctly than a flatter shape with good descriptions. Constrain to delete error classes, not to show off the type system — if a tighter schema makes the common call harder to express, you have optimized the wrong thing.