Agent ROI is the value created versus a real counterfactual, not the labor it superficially resembles.
The reflexive ROI pitch — "the agent does a human's job, so ROI is the salary minus the API bill" — is almost always wrong, and wrong in a way that survives until a finance review kills the program. Real ROI measurement needs an explicit value definition, a defensible counterfactual baseline, an honest cost side that includes the human still in the loop, and a time-to-value that accounts for the ramp before the agent is trustworthy.
The "agent replaces a human" framing is a category error.
An agent rarely replaces a whole role; it absorbs a slice of tasks and changes the shape of the rest. Pricing ROI as "headcount removed × salary" assumes the human disappears — but usually they move to reviewing the agent, handling its escalations, and doing the judgment work it cannot. The honest question is not "what human did this replace" but "what is the marginal value of the work now done that was not being done before, net of the human effort still required." Often the win is throughput or coverage that no headcount plan would have funded, not a salary line you deleted.
No counterfactual baseline means no ROI, only a story.
ROI is value relative to what would have happened anyway. If tickets were already trending down, or a cheaper rules engine would have caught 60% of cases, the agent only earns the delta over that baseline — not the full outcome. The strongest baseline is an experiment; absent that, a pre/post with a control segment; weakest is a counterfactual estimate stated with its assumptions exposed.
# ROI is the delta over the counterfactual, net of new costs value_with_agent = outcomes_delivered * value_per_outcome value_counterfactual = baseline_outcomes * value_per_outcome # happens anyway incremental_value = value_with_agent - value_counterfactual roi = (incremental_value - loaded_agent_cost) / loaded_agent_cost # crediting the agent with value_with_agent alone is the lie
The most common ROI inflation: crediting the agent with the entire outcome instead of the increment over the baseline. It turns a modest, real 1.4× into a fictional 9× that collapses the moment finance asks "versus what?" Build the baseline before the agent, or you cannot measure it after.
Define value as a verified outcome, not activity.
"The agent handled 40,000 requests" is activity, not value — a number that goes up whether the work helped or not. Value must be a verified outcome tied to a metric the business already trusts: tickets resolved without reopen, qualified leads that converted, documents processed and accepted downstream. Activity metrics reward a busy agent; outcome metrics reward a useful one, and only the second survives scrutiny. If you cannot name the verified outcome, you do not yet have an ROI case, you have a usage report.
Cost the human who never actually left the loop.
The denominator is fully-loaded, and the largest hidden term is human effort that did not disappear. McKinsey's 2026 State of AI data shows well-scoped deployments reaching strong returns, but the programs that fail the finance review are the ones that costed only the API bill and ignored the new work the agent created.
- Review and oversight — humans checking, correcting, and approving the agent's output is recurring cost, often the dominant one early on.
- Escalation handling — the fraction the agent cannot close still consumes a human, frequently the hard cases that cost the most per item.
- Maintenance — evals, prompt and tool upkeep, and model-upgrade churn are an ongoing line, not a one-time build.
Time-to-value, not steady-state ROI, decides the program.
Agents have a long ramp: low success and heavy human review at launch, improving as evals tighten and prompts mature. Steady-state ROI may be excellent while the cumulative cash position is deeply negative for months. Published patterns put payback for well-targeted agents in the 3–12 month range and strong returns within roughly a year of production — but a program judged on month-one numbers, or funded with less runway than its time-to-value, gets cancelled before it earns out. Measure the cumulative curve and the breakeven point, not just the asymptote.
When the honest ROI number is "not yet, and maybe not here."
If the value is unverifiable, the baseline is unknowable, or the loaded human cost erases the gain, the honest answer is that this is not yet an ROI-positive use case — and saying so is cheaper than discovering it in a year. Some workflows genuinely do not clear the bar; a faithful negative result redirects investment to one that does. An ROI model that cannot survive a hostile finance review is not conservative analysis, it is a pre-scheduled cancellation.