When (and when not) to go multi-agent: count the coordination tax first.
"Multi-agent" is the most over-applied architecture in agentic AI: enterprise surveys through 2025 show a large jump in projects claiming multi-agent designs, yet most of those tasks would run faster, cheaper, and more reliably as one well-equipped agent. This essay gives you a decision procedure — what coordination actually costs, the failure mode that single-agent designs do not have, and the three honest reasons a problem genuinely needs more than one agent.
The default is one agent with tools, and you should have to argue your way off it.
A single agent in a tool-use loop — plan, call a tool, observe, repeat — already covers most of what people reach for multi-agent to do. It has one context window (no information is lost crossing an agent boundary), one control thread (no consensus to reach), and one cost meter (no fan-out multiplier). Every additional agent is a new context boundary that drops information, a new message channel that can deadlock, and a new token stream that multiplies spend. Treat multi-agent as a cost you justify, not a default you assume.
The coordination tax is real, quantifiable, and usually underestimated.
Splitting work across agents introduces costs that do not exist in a single loop. Name them so you can price them:
- Token multiplier. Each subagent carries its own context and tool calls. Anthropic reported their multi-agent research system used on the order of 15× the tokens of a single chat on a comparable task. That multiplier is the price of admission, not an edge case.
- Lossy boundaries. A subagent returns a summary, not its full reasoning. The parent acts on the summary. Nuance, caveats, and uncertainty are compressed away at every hop — a structural information loss that a single context never suffers.
- Coordination latency. Decompose, dispatch, wait for the slowest worker, aggregate. The critical path is the slowest branch plus orchestration overhead, not the average.
- Emergent failure surface. Deadlock, livelock, groupthink, and error propagation are system-level failures with no single-agent analogue (see
multi-agent-failure-modes).
If you cannot state your expected token multiplier and your information-loss budget per hop before building, you are not designing a multi-agent system — you are gambling that emergent coordination will be free. It will not be.
One agent with tools beats many when the work is sequential or context-coupled.
Multi-agent wins come from parallelism and isolation. If your task has neither, splitting it only adds tax. A "research → analyze → write" pipeline where each stage needs the full output of the previous one is sequential by nature: three agents in a chain is just one agent with extra serialization points and three lossy handoffs. The honest test: if every subtask needs to see most of what every other subtask produced, you have one task wearing a team's costume.
# sequential + context-coupled -> single agent wins def solve(task): ctx = [] for step in plan(task): ctx.append(agent.run(step, history=ctx)) # no lossy hop return ctx[-1]
There are exactly three honest reasons to split.
- Genuine parallelism over independent subproblems. Searching ten sources, auditing twelve files, or evaluating six candidate designs — work that has no read-after-write dependency between branches and where wall-clock latency matters. This is the strongest case: the speedup is real and the boundaries are clean.
- Context isolation as a feature. A subagent given only the files it needs is harder to distract and prompt-inject than one agent holding everything. Isolation can raise reliability and shrink per-agent context — a correctness argument, not just a speed one.
- Hard capability or trust boundaries. Different models, different tool permissions, or different blast radius — e.g. an untrusted code-execution agent sandboxed away from an agent holding production credentials. Here the split is a security control, not an optimization.
"Different roles" and "it feels more organized" are not on this list. Role labels are a prompt-engineering technique that works fine inside one agent. Reorganizing a single prompt into five personas that all share a context buys you nothing and pays full coordination tax.
Decide with a checklist, not an aesthetic.
Before you split, all of these should be true: (1) the subtasks are independent enough that branches rarely need each other's intermediate state; (2) the value of parallelism or isolation exceeds your priced coordination tax; (3) you can write a deterministic aggregation step that does not itself need to re-derive what the workers did; and (4) you have observability that spans agents (per-agent traces, propagation depth, joint cost), because you cannot debug what you cannot see across the boundary. Fail any one and the single-agent design is not just simpler — it is more correct.
When NOT to go multi-agent.
Do not split when the task is sequential, when subtasks are tightly context-coupled, when latency does not matter (parallelism's only payoff is wasted), or when you lack cross-agent observability to debug the result. The strongest multi-agent designs of 2025–2026 are narrow: a handful of agents on genuinely parallel, isolatable work — not org charts of personas. Add an agent only when parallelism or isolation buys you more than the token multiplier, the lossy handoffs, and a brand-new class of system failures cost you.