Error Messages as Prompts

Deep Dive · Tool & Capability Design

An error message is a prompt: it is the only chance you get to teach the model the next attempt.

When a tool call fails, the error string goes straight back into the model's context and becomes the single most influential input to its very next decision. A stack trace teaches it nothing; "error: 400" teaches it to guess; a sentence that says what was wrong and what to do instead often fixes the loop in one turn. This essay treats error messages as a first-class part of the tool's design — written for a model that will read them and immediately try again.

STEP 1

The model's next call is conditioned almost entirely on the last error.

A human who hits an error opens docs, reads source, asks a colleague. The model has exactly one artifact to learn from: the text you returned. It will not go find the schema again; it will pattern-match its retry against your error string. This makes the error message the highest-leverage sentence in the whole tool — it is, functionally, a just-in-time prompt injected at the exact moment the model is deciding what to do differently. Write it as such.

STEP 2

A good error names the cause and prescribes the fix.

The minimum useful error has three parts: what was rejected, why it was rejected, and what a correct call looks like. Omit any one and the model is guessing. "Invalid input" fails all three. "amount must be a positive integer in cents; you sent 39.99; send 3999" passes all three and the retry succeeds immediately, because the correction is in the message, not something the model has to re-derive.

# Useless: the model guesses, retries, fails the same way
Error: 400 Bad Request
ValidationError: field 'amount' invalid

# Useful: cause + value + the corrected call, in words
Error: 'amount' must be a positive integer in cents.
You sent 39.99 (dollars). Retry with amount=3999.

Echo the offending value back in the error. "You sent 39.99" lets the model see its own mistake instead of re-reasoning about what it might have done wrong — the single highest-yield habit in error design.

STEP 3

Distinguish retryable from terminal, explicitly, in words.

The model needs to know whether trying again could work. A transient failure ("rate limited, retry after 2s") invites a backoff; a terminal one ("order 42 is already refunded; do not retry") must shut the loop down or you get an infinite identical retry burning budget. Don't make the model infer this from an HTTP code it half-understands — state it. "This will not succeed on retry; the order does not exist" ends a loop that "404" would have spun forever.

An ambiguous error on a side-effecting tool is how runaway loops are born: the model can't tell "already done, stop" from "transient, try again," picks try-again, and re-fires forever. The retryable/terminal signal is a containment control, not just a nicety.

STEP 4

Errors should steer toward the right tool, not just away from the wrong call.

The best errors do positive routing. If the model called delete_user with a name instead of an ID, the error should say "expected a user ID; call search_users first to get one" — naming the tool that produces the missing input. This turns a dead end into a plan. An error that only says "no" leaves the model to rediscover the workflow; an error that says "no, do this instead" is the difference between a recovered run and an abandoned one.

STEP 5

Never let a tool fail silently or lie about success.

The worst error is the one that wasn't returned. A tool that swallows a failure and returns an empty result, or returns 200 when nothing happened, makes the model proceed confidently on a false premise — and that error surfaces three steps later as inexplicable behavior no trace can localize. Partial success is the same trap: "refunded but notification failed" must be reported as exactly that, not as success, or the agent reports a falsehood to the user. Honest failure beats convenient silence every time.

STEP 6

When a terse error is the right call.

Verbose, instructive errors cost context tokens, and an error that dumps a 500-line stack trace or the entire valid schema on every failure poisons the window worse than a curt one. Make errors instructive, not encyclopedic — one sentence of cause plus the corrected call beats a wall of diagnostics the model has to wade through to find the one fact it needed.