When (and When Not) to Go Multi-Agent

Deep Dive · Multi-Agent Systems

When (and when not) to go multi-agent: count the coordination tax first.

"Multi-agent" is the most over-applied architecture in agentic AI: enterprise surveys through 2025 show a large jump in projects claiming multi-agent designs, yet most of those tasks would run faster, cheaper, and more reliably as one well-equipped agent. This essay gives you a decision procedure — what coordination actually costs, the failure mode that single-agent designs do not have, and the three honest reasons a problem genuinely needs more than one agent.

STEP 1

The default is one agent with tools, and you should have to argue your way off it.

A single agent in a tool-use loop — plan, call a tool, observe, repeat — already covers most of what people reach for multi-agent to do. It has one context window (no information is lost crossing an agent boundary), one control thread (no consensus to reach), and one cost meter (no fan-out multiplier). Every additional agent is a new context boundary that drops information, a new message channel that can deadlock, and a new token stream that multiplies spend. Treat multi-agent as a cost you justify, not a default you assume.

STEP 2

The coordination tax is real, quantifiable, and usually underestimated.

Splitting work across agents introduces costs that do not exist in a single loop. Name them so you can price them:

Token multiplier. Each subagent carries its own context and tool calls. Anthropic reported their multi-agent research system used on the order of 15× the tokens of a single chat on a comparable task. That multiplier is the price of admission, not an edge case.
Lossy boundaries. A subagent returns a summary, not its full reasoning. The parent acts on the summary. Nuance, caveats, and uncertainty are compressed away at every hop — a structural information loss that a single context never suffers.
Coordination latency. Decompose, dispatch, wait for the slowest worker, aggregate. The critical path is the slowest branch plus orchestration overhead, not the average.
Emergent failure surface. Deadlock, livelock, groupthink, and error propagation are system-level failures with no single-agent analogue (see multi-agent-failure-modes).

If you cannot state your expected token multiplier and your information-loss budget per hop before building, you are not designing a multi-agent system — you are gambling that emergent coordination will be free. It will not be.

STEP 3

One agent with tools beats many when the work is sequential or context-coupled.

Multi-agent wins come from parallelism and isolation. If your task has neither, splitting it only adds tax. A "research → analyze → write" pipeline where each stage needs the full output of the previous one is sequential by nature: three agents in a chain is just one agent with extra serialization points and three lossy handoffs. The honest test: if every subtask needs to see most of what every other subtask produced, you have one task wearing a team's costume.

# sequential + context-coupled -> single agent wins
def solve(task):
    ctx = []
    for step in plan(task):
        ctx.append(agent.run(step, history=ctx))  # no lossy hop
    return ctx[-1]

STEP 4

There are exactly three honest reasons to split.

Genuine parallelism over independent subproblems. Searching ten sources, auditing twelve files, or evaluating six candidate designs — work that has no read-after-write dependency between branches and where wall-clock latency matters. This is the strongest case: the speedup is real and the boundaries are clean.
Context isolation as a feature. A subagent given only the files it needs is harder to distract and prompt-inject than one agent holding everything. Isolation can raise reliability and shrink per-agent context — a correctness argument, not just a speed one.
Hard capability or trust boundaries. Different models, different tool permissions, or different blast radius — e.g. an untrusted code-execution agent sandboxed away from an agent holding production credentials. Here the split is a security control, not an optimization.

"Different roles" and "it feels more organized" are not on this list. Role labels are a prompt-engineering technique that works fine inside one agent. Reorganizing a single prompt into five personas that all share a context buys you nothing and pays full coordination tax.

STEP 5

Decide with a checklist, not an aesthetic.

Before you split, all of these should be true: (1) the subtasks are independent enough that branches rarely need each other's intermediate state; (2) the value of parallelism or isolation exceeds your priced coordination tax; (3) you can write a deterministic aggregation step that does not itself need to re-derive what the workers did; and (4) you have observability that spans agents (per-agent traces, propagation depth, joint cost), because you cannot debug what you cannot see across the boundary. Fail any one and the single-agent design is not just simpler — it is more correct.

STEP 6

When NOT to go multi-agent.

Do not split when the task is sequential, when subtasks are tightly context-coupled, when latency does not matter (parallelism's only payoff is wasted), or when you lack cross-agent observability to debug the result. The strongest multi-agent designs of 2025–2026 are narrow: a handful of agents on genuinely parallel, isolatable work — not org charts of personas. Add an agent only when parallelism or isolation buys you more than the token multiplier, the lossy handoffs, and a brand-new class of system failures cost you.