Transparency & Explainability

Playbook · Agent UX & Human Interaction

An explanation users find convincing is not the same as one that is true.

Transparency in agent UX is usually pitched as "show the reasoning." But a model's emitted chain-of-thought is a plausible narrative, not a verified causal account of why it acted — and recent work (e.g., Barez et al., "Chain-of-Thought Is Not Explainability", Oxford 2025; the broader faithfulness-vs-plausibility literature) shows these can come apart sharply. This essay separates the explanations you can trust from the ones that merely persuade, and shows how to pick the right altitude of explanation for the decision at hand.

STEP 1

Faithful and plausible are different axes — and users only perceive one.

Plausibility is whether an explanation reads as sensible to a human. Faithfulness is whether it actually reflects the computation that produced the output. The danger: models reliably produce plausible rationales even when the real cause was a prompt cue they never mention (the literature calls this post-hoc rationalization). Users have no way to perceive faithfulness directly — a fluent, well-structured rationale feels faithful — so a persuasive-but-unfaithful explanation is an over-trust generator, exactly the H1 failure in a new costume.

Displaying a raw chain-of-thought as "the agent's reasoning" makes an unverified claim about causality. It often increases user trust without increasing actual reliability — the worst combination.

STEP 2

Prefer artifacts you can check over narratives you must believe.

The robust move is to ground transparency in verifiable substrate rather than self-report. Rank explanation types by how independently checkable they are:

Strongest — externally grounded: the source document quote, the exact tool call and its raw result, the diff that will be applied, the query that ran. These are facts about the world the user can inspect without trusting the model's introspection.
Middle — structural: the plan, the decomposition, which tools were chosen and which were skipped. Checkable for coherence, not for hidden causes.
Weakest — introspective: the free-text "here's why I did that." Useful for hypothesis-forming, never as evidence of correctness.

Lead the UI with the strongest tier. Treat the narrative as garnish on top of citations and traces, not as the dish.

STEP 3

Explain at the altitude the decision requires.

There is no single right depth — there is the depth that supports this user's next decision. Over-explaining is its own failure: a wall of reasoning trains users to collapse the panel unread, and you lose the channel exactly as with over-displayed confidence in H1.

# Altitude follows consequence and the user's role
if stakes == "high" or user.is_reviewing:
    show(plan + sources + diff)        # inspectable substrate
elif user.asked_why:
    show(one_line_rationale, expandable=True)
else:
    show(outcome + provenance_badge)   # quiet by default

Default to a one-line "what and from where", make the full plan and sources one click away, and never auto-expand a chain-of-thought. Depth on demand beats depth by default.

STEP 4

Provenance is the highest-leverage transparency you can ship.

For most agent outputs the single most decision-useful piece of transparency is not why but from where: which source, which tool, which retrieved passage, what timestamp. Provenance is externally verifiable, hard to fabricate convincingly, and directly actionable — a user can click through and check. It also degrades gracefully: "I couldn't find a source for this" is itself honest, high-value transparency, and an agent that says it earns more calibrated trust than one that always has a fluent answer.

STEP 5

Make uncertainty and the road-not-taken visible.

Transparency that only narrates the chosen path hides the information a reviewer most needs: what the agent was unsure about and what it nearly did instead. Surface the close alternative it rejected, the assumption it had to make, and the inputs it lacked. This converts the explanation from a justification (which motivates rationalization) into a decision aid (which invites correction) — and it routes the reviewer's scarce attention to the load-bearing assumptions instead of the confident middle.

STEP 6

When NOT to show the reasoning trace.

If you can't verify a rationale is faithful, don't present it as the agent's reasoning — surface checkable provenance and the plan instead, and let a self-reported narrative stay an explicit, opt-in hypothesis.