Multi-Agent Topologies

Deep Dive · Multi-Agent Systems

Multi-agent topologies: the wiring diagram is the design decision.

Once you have decided to use more than one agent, the next decision dominates everything else: how the agents are connected. Star, pipeline, hierarchy, and mesh are not interchangeable styles — each has a different communication cost, a different failure profile, and a narrow regime where it is the right answer. This essay maps the four canonical topologies, their O(·) message costs, and how to pick one without defaulting to "everyone talks to everyone."

STEP 1

Topology is the variable that most strongly predicts cost and reliability.

The agents' prompts get the attention; the graph determines the outcome. Communication cost scales with the number of edges, not the number of agents: a star with n workers has n edges, a full mesh has n(n−1)/2. Every edge is a token-bearing channel and a potential point of error propagation or deadlock. Choosing a topology is choosing how much your system will spend and how it will fail — decide it explicitly, draw it, and defend it.

STEP 2

Star: one coordinator, n leaves, O(n) edges.

A central agent dispatches to workers that never talk to each other and report only upward. This is the workhorse of production multi-agent systems through 2025 (the supervisor/worker pattern, see supervisor-worker-pattern). Communication is O(n), the coordinator is the single place to enforce budget and aggregation, and a failed worker is contained. The price: the coordinator is a bottleneck and a single point of failure, and leaves cannot share discoveries except through it.

# star: coordinator fans out, workers stay isolated
def star(task):
    subs = coordinator.decompose(task)
    results = [worker.run(s) for s in subs]   # no leaf-to-leaf edge
    return coordinator.aggregate(results)

Default to star. Most "we need a mesh so agents can collaborate" intuitions are solved by a star plus a shared read-only artifact (see shared-memory-and-blackboard) — you get cross-worker information without the quadratic message cost or the mesh's tangled failure modes.

STEP 3

Pipeline: a linear chain, O(n) edges, fully serial.

Agent A's output is agent B's input, and so on. Each stage specializes; latency is the sum of stages, never overlapped. Pipelines suit work with a true sequential dependency and clean, narrow interfaces between stages (extract → transform → validate). The structural weakness is error propagation with no recovery path: a subtly wrong stage-two output is treated as ground truth by stages three through n, and the lossy handoff at each boundary compounds. A pipeline with five hops has five places to silently corrupt the result and zero places to catch it unless you add explicit validation edges.

STEP 4

Hierarchy: a tree of coordinators, O(n) edges, bounded depth.

A star whose leaves are themselves coordinators. This is how you scale beyond one coordinator's context limit: a top planner delegates to mid-level leads that fan out to workers. Communication stays O(n) on a tree, and each subtree is an isolation and budget boundary. The new failure mode is multi-level lossy summarization: a worker's nuance is compressed at its lead, then again at the planner, so the top sees a summary of summaries. Keep the tree shallow (two, rarely three levels) and pass structured artifacts, not prose, up the tree to limit compression loss.

Every extra level of hierarchy is another lossy summarization hop and another coordinator that can become a bottleneck. Depth is not free organization — it is compounding information loss. If you are reaching for a fourth level, the task decomposition is wrong, not the tree too short.

STEP 5

Mesh: any-to-any, O(n²) edges — powerful and rarely worth it.

Every agent can message every other agent. Mesh enables emergent collaboration and debate-style cross-talk (see agent-debate-and-ensembles), but it pays the full O(n²) communication cost and inherits every system-level failure at once: groupthink (agents converge by echoing each other), deadlock and livelock (circular waits and message storms with no progress), and an observability nightmare (no central trace; you must reconstruct order from a distributed log). Mesh is justified for small n where the cross-talk is the algorithm — structured debate among three to five agents — not as a general collaboration substrate.

STEP 6

When NOT to pick the richer topology.

The instinct to "let agents talk freely" almost always over-buys: mesh's quadratic cost and tangled failure modes are real, while its benefit only materializes when interaction itself is the computation. Use a pipeline only for genuinely serial work with narrow interfaces, a star as the default for parallel fan-out, a hierarchy only when one coordinator's context cannot hold the plan, and a mesh only for small-n structured debate. Pick the sparsest topology that still expresses the dependencies — edges are not collaboration, they are cost and failure surface.