Designing for Trust & Calibration

Playbook · Agent UX & Human Interaction

Designing for trust means engineering for the right amount of it, not the most.

A trustworthy agent is not one users trust the most — it is one whose trust tracks its actual reliability. The expensive failures in production are not under-trusted agents (users just stop using them); they are over-trusted ones that are confidently wrong and get believed. This essay treats trust as a calibration target you design toward, with confidence display, track records, and friction as the instruments.

STEP 1

The target is calibration, not maximization.

Two failure modes bracket every agent. Under-trust: the user re-checks everything, the agent's speed advantage evaporates, and the product is abandoned. Over-trust (sometimes called automation complacency): the user stops inspecting outputs, so the agent's errors pass straight through to consequences. The design goal — the term of art from the human-automation literature carried into agent UX — is trust calibration: user-perceived reliability should match measured reliability, per task type, because an agent that is excellent at summarizing and mediocre at arithmetic has two different true reliabilities and deserves two different levels of trust.

"Users love it" is not a success metric for an agent. An agent users love and never verify is a latent incident. Measure the gap between trust and reliability, not trust alone.

STEP 2

A confidently wrong agent is the most expensive output you can ship.

Error cost is not symmetric. A hedged wrong answer ("I'm unsure, but possibly X — verify") costs a re-check. A confident wrong answer costs the full downstream consequence plus the erosion of every future calibration. The asymmetry means fluency is a liability when it is decoupled from correctness: LLMs produce equally fluent prose whether right or wrong, so fluency cannot be the user's reliability signal — yet by default it is the only one they have. Your job is to give them a better one.

Separate the claim from the confidence. Render them as distinct visual elements so a user cannot read certainty out of polished phrasing.
Make uncertainty legible, not decorative. "Low confidence: this figure is interpolated, not retrieved" beats a generic 62%.
Tie confidence to verifiable substrate. Confidence backed by a citation, a passing check, or a tool result is worth showing; a bare model-emitted probability is often miscalibrated and should be treated with suspicion.

STEP 3

Display confidence only where it changes a decision.

A number on every output is noise that trains users to ignore the channel. Surface confidence where the user has a decision whose right answer depends on it — accept vs. verify, send vs. hold, auto-apply vs. review. Elsewhere, prefer a binary verified / unverified badge over a false-precision percentage.

# Confidence is a routing input, not a UI ornament
if answer.has_citation and answer.check_passed:
    render(answer, badge="verified")
elif answer.confidence < 0.6 or answer.is_extrapolated:
    render(answer, badge="verify-before-use", expand_reasoning=True)
else:
    render(answer, badge="unverified")

Calibrate the displayed signal against ground truth before you ship it. If items badged "verified" are wrong 15% of the time, the badge is actively manufacturing over-trust — worse than no badge at all.

STEP 4

Trust is built per-domain and over time, not granted globally.

Trust is not a scalar the user sets once; it accrues from observed track record within a domain. Make that track record a first-class product surface: "drafted 240 replies, 6 edited by you, 0 reverted" is a calibration instrument users can reason about. Surface it where the user is deciding how much to check, scoped to the current task class — global accuracy averaged over everything hides exactly the per-domain variance calibration needs.

STEP 5

Friction is a trust instrument — spend it where it calibrates.

Confirmation steps, diffs-before-apply, and "show your work" are not just safety mechanisms; they are calibration devices. Friction placed on a high-consequence, low-track-record action makes the user look, which is the moment calibration actually happens. Friction placed on a high-track-record, low-consequence action is pure tax — it trains the user to click through without reading, which then transfers to the actions that mattered. Allocate friction inversely to demonstrated reliability and proportionally to consequence; never uniformly.

STEP 6

When NOT to surface confidence at all.

If your confidence signal isn't calibrated against measured outcomes, don't display it — an unvalidated confidence number doesn't reduce over-trust, it launders it into something the user trusts more.