When (and When Not) to Go Multi-Agent

G1
Deep Dive · Multi-Agent Systems

When (and when not) to go multi-agent: count the coordination tax first.

"Multi-agent" is the most over-applied architecture in agentic AI: enterprise surveys through 2025 show a large jump in projects claiming multi-agent designs, yet most of those tasks would run faster, cheaper, and more reliably as one well-equipped agent. This essay gives you a decision procedure — what coordination actually costs, the failure mode that single-agent designs do not have, and the three honest reasons a problem genuinely needs more than one agent.

STEP 1

The default is one agent with tools, and you should have to argue your way off it.

A single agent in a tool-use loop — plan, call a tool, observe, repeat — already covers most of what people reach for multi-agent to do. It has one context window (no information is lost crossing an agent boundary), one control thread (no consensus to reach), and one cost meter (no fan-out multiplier). Every additional agent is a new context boundary that drops information, a new message channel that can deadlock, and a new token stream that multiplies spend. Treat multi-agent as a cost you justify, not a default you assume.

STEP 2

The coordination tax is real, quantifiable, and usually underestimated.

Splitting work across agents introduces costs that do not exist in a single loop. Name them so you can price them:

  • Token multiplier. Each subagent carries its own context and tool calls. Anthropic reported their multi-agent research system used on the order of 15× the tokens of a single chat on a comparable task. That multiplier is the price of admission, not an edge case.
  • Lossy boundaries. A subagent returns a summary, not its full reasoning. The parent acts on the summary. Nuance, caveats, and uncertainty are compressed away at every hop — a structural information loss that a single context never suffers.
  • Coordination latency. Decompose, dispatch, wait for the slowest worker, aggregate. The critical path is the slowest branch plus orchestration overhead, not the average.
  • Emergent failure surface. Deadlock, livelock, groupthink, and error propagation are system-level failures with no single-agent analogue (see multi-agent-failure-modes).

If you cannot state your expected token multiplier and your information-loss budget per hop before building, you are not designing a multi-agent system — you are gambling that emergent coordination will be free. It will not be.

STEP 3

One agent with tools beats many when the work is sequential or context-coupled.

Multi-agent wins come from parallelism and isolation. If your task has neither, splitting it only adds tax. A "research → analyze → write" pipeline where each stage needs the full output of the previous one is sequential by nature: three agents in a chain is just one agent with extra serialization points and three lossy handoffs. The honest test: if every subtask needs to see most of what every other subtask produced, you have one task wearing a team's costume.

# sequential + context-coupled -> single agent wins
def solve(task):
    ctx = []
    for step in plan(task):
        ctx.append(agent.run(step, history=ctx))  # no lossy hop
    return ctx[-1]
STEP 4

There are exactly three honest reasons to split.

  • Genuine parallelism over independent subproblems. Searching ten sources, auditing twelve files, or evaluating six candidate designs — work that has no read-after-write dependency between branches and where wall-clock latency matters. This is the strongest case: the speedup is real and the boundaries are clean.
  • Context isolation as a feature. A subagent given only the files it needs is harder to distract and prompt-inject than one agent holding everything. Isolation can raise reliability and shrink per-agent context — a correctness argument, not just a speed one.
  • Hard capability or trust boundaries. Different models, different tool permissions, or different blast radius — e.g. an untrusted code-execution agent sandboxed away from an agent holding production credentials. Here the split is a security control, not an optimization.

"Different roles" and "it feels more organized" are not on this list. Role labels are a prompt-engineering technique that works fine inside one agent. Reorganizing a single prompt into five personas that all share a context buys you nothing and pays full coordination tax.

STEP 5

Decide with a checklist, not an aesthetic.

Before you split, all of these should be true: (1) the subtasks are independent enough that branches rarely need each other's intermediate state; (2) the value of parallelism or isolation exceeds your priced coordination tax; (3) you can write a deterministic aggregation step that does not itself need to re-derive what the workers did; and (4) you have observability that spans agents (per-agent traces, propagation depth, joint cost), because you cannot debug what you cannot see across the boundary. Fail any one and the single-agent design is not just simpler — it is more correct.

STEP 6

When NOT to go multi-agent.

Do not split when the task is sequential, when subtasks are tightly context-coupled, when latency does not matter (parallelism's only payoff is wasted), or when you lack cross-agent observability to debug the result. The strongest multi-agent designs of 2025–2026 are narrow: a handful of agents on genuinely parallel, isolatable work — not org charts of personas. Add an agent only when parallelism or isolation buys you more than the token multiplier, the lossy handoffs, and a brand-new class of system failures cost you.