The quality of an agent product is decided by what happens after it is wrong.
Every agent will fail visibly in front of a user. The differentiator is not failure rate alone — it is whether failure is graceful, recoverable, and trust-repairing rather than catastrophic, opaque, and trust-destroying. This essay covers failing safely, undo and rollback as the primary safety net, error messages a user can actually act on, and the specific work of rebuilding calibrated trust after a mistake the user watched happen.
Design the failure path with the same care as the success path.
Most agent UX effort goes into the happy path; the failure path is whatever the exception handler happens to do. That ratio is backwards — users form their durable judgment of an agent during failures, not successes. A graceful failure has three properties: it stops before compounding (one bad step, not ten built on it), it leaves the system in a known state (not half-applied), and it tells the user precisely where things stand. Failing loudly and early beats failing silently and late every time.
The dangerous failure is not the visible crash — it is the agent that quietly produces a plausible-but-wrong result and continues. Bias detection toward surfacing uncertainty and stopping, not toward pushing through.
Undo is the safety net that makes everything else affordable.
If actions can be cleanly undone, the entire cost of being wrong drops by an order of magnitude — and so can the friction everywhere else (this is the reversibility lever from H2, viewed from the failure side). Invest in undo before investing in better confirmations: a strong undo lets you safely lower approval friction, which reduces fatigue, which improves the oversight that catches the failures undo then cleans up.
# Make the action reversible; record how to reverse it op = effect.apply(action) journal.record(op.inverse) # compensating action, kept ready if user.undo(op) or monitor.detected_regression(op): await journal.compensate(op) # roll back to known-good state
- Prefer real reversibility (compensating actions) over a confirmation that merely warns.
- Where true rollback is impossible, insert a delay window with cancel before the effect lands.
- Make undo discoverable at the moment of regret, not buried in a menu.
An error message is a UI for the user's next action.
"Something went wrong" tells the user nothing they can use. An actionable failure message answers three questions: what state am I in now (was anything applied? is it consistent?), what can I do (retry / undo / take over / escalate), and what should I not assume (e.g., "the email was NOT sent"). Surface the durable facts — what completed, what was rolled back, what is pending — and put the recovery action the user most likely wants as the primary control, right there, not three screens away.
Fail closed on consequences, fail open on capability.
The direction of failure is a deliberate choice. On the consequence axis — money, data, external communication — fail closed: when uncertain or erroring, do less, not more; an action not taken is recoverable, an erroneous irreversible action often is not. On the capability axis — a tool is down, a source is unreachable — fail open toward honest degradation: return partial results clearly labeled, or hand back to the human with state intact, rather than fabricating to appear functional. The worst failure mode combines them: a confident, complete-looking, wrong, irreversible result.
Trust repair after a visible mistake is explicit work, not time.
A visible error does not just cost that task; it resets the user's calibration (H1) for the whole agent, often globally even though the failure was scoped. Trust does not recover by waiting — it recovers through deliberate repair: acknowledge the specific error plainly (no minimizing, no anthropomorphic apology theater), state the concrete change that makes a recurrence less likely, and — most powerfully — propose a temporarily lowered autonomy rung (H5) in exactly the scope that failed. A demotion the agent proposes itself reads as accountability and rebuilds calibrated trust faster than any worded apology.
After a visible failure, scope the trust impact for the user explicitly: "this affected refund handling; ticket labeling is unchanged." Unscoped failures silently poison trust in capabilities that never failed.
When NOT to auto-retry.
If the failure may have partially applied a non-idempotent effect, do not auto-retry — silently retrying over uncertain state turns one recoverable failure into a duplicated, irreversible one.