Where the Economics Breaks

Operation · Economics & ROI

Agent unit economics do not erode gradually — they invert, suddenly, at the places everyone stopped looking.

A profitable agent and a money-losing agent are frequently the same agent on different inputs. The economics rarely degrade on a smooth slope; they flip at specific structural points — retry storms, the long tail, escalation, the eval bill, and the silent-failure tax — where cost per win crosses revenue per win and stays there. This essay catalogs where the inversion happens, so the watch is on the failure surface and not on the comfortable average.

STEP 1

Retries are a multiplier, and a multiplier on a flaky tool is unbounded.

Retry logic that looks prudent in isolation composes catastrophically. A 3× retry on a step inside a 5-deep agent tree where each level also retries is not 3×; it is 3⁵ in the worst case, and the worst case is correlated — when a dependency degrades, every task retries at once. The economics invert not at the average retry rate but in the correlated burst, where a tool blip turns a profitable hour into the month's largest cost line. Retry policy is a unit-economics decision disguised as a reliability detail.

STEP 2

The long tail eats the margin the median earned.

Task cost is power-law distributed: the median task is cheap and the p99 task is 50–100× it, burning loops and fan-out on the inputs the agent handles worst. Because the tail is a small fraction of volume, it is invisible on the mean dashboard while it consumes the majority of total spend.

# the mean is healthy; the tail already ate the profit
median_cost = 0.18
p99_cost    = 14.0          # ~78x the median — same agent, hard input
tail_share  = 0.02          # 2% of tasks ...

tail_spend_fraction = (0.02 * 14.0) / (0.98 * 0.18 + 0.02 * 14.0)
# ≈ 0.61  — 2% of tasks are 61% of total cost

Optimizing the median is optimizing the 39% of spend that was never the problem. The margin lives or dies on the p99 — cap, route, or refuse the tail explicitly, because it will not show up in any average you are watching until the invoice does.

STEP 3

Escalation inverts the economics it was supposed to protect.

Human escalation is the safety valve that also quietly destroys the unit economics. Each escalated task pays the full agent cost and then a human's loaded cost, so an escalation rate creeping from 5% to 20% does not add 15% to cost — it can double cost per successful task, because escalations cluster on the expensive hard cases. An agent that hits its ROI target only by escalating the hard work has not automated the workflow; it has added an AI surcharge on top of the human one.

STEP 4

The eval bill scales with autonomy, not with revenue.

Making an autonomous agent trustworthy costs money that the pilot did not: LLM-as-judge calls on every output, regression suites on every model upgrade, trace storage, and human review of the eval itself. This cost grows with the agent's autonomy and risk surface, not with the revenue it earns — so the safer and more autonomous you make it, the more the eval line crowds the margin. It is real cost of goods, and a model that books it as one-time R&D is hiding the inversion, not avoiding it.

STEP 5

The silent-failure tax is the cost you only pay downstream.

The most expensive failure is the one that costs nothing on the dashboard: a confident, wrong answer that passes the agent's own checks, ships, and detonates downstream — a wrong refund, a bad merge, a misfiled record. Its cost is not in the model bill; it is the cleanup, the trust damage, and the human re-verification of everything the agent now touches.

It is unmetered. No token counter sees it; it surfaces as a support cost, a chargeback, or churn weeks later.
It taxes the wins too. One silent failure forces re-checking of the correct outputs, erasing the labor savings that justified the agent.
It compounds the ROI lie. ROI counted the output as a success; the real ledger pays for it twice.

STEP 6

When the honest move is to narrow scope or not ship.

When the tail cannot be bounded, escalation cannot be contained, or the silent-failure tax exceeds the labor saved, the economics do not improve with scale — they invert harder, because every failure mode here is amplified by volume. The fix is rarely a cheaper model; it is a narrower scope where the agent is actually reliable, or a decision not to ship this workflow. An agent that is only profitable on its easy inputs is not a product with a tuning problem — it is a smaller product pretending to be a bigger one.