Routing & dispatch — selection, fan-out, parallelism.
A router classifies an input and dispatches it to the right handler, model, or sub-agent. It is the cheapest structural fix for tool-selection errors and the foundation of multi-agent systems. This essay covers classifier vs. tool-call routing, parallel fan-out with a gather, and the failure modes of the routing layer itself.
Why a router, instead of more prose in the prompt.
As a single ReAct agent's tool set grows past roughly a dozen, two things degrade: tool-selection accuracy (the model confuses similar tools) and context cost (every tool schema is in every call). The instinct is to add instructions to the system prompt. The structural fix is to put a router in front: a small, cheap, testable component that decides which capability handles this input, then hands off with only that capability's tools in scope.
The win is separation of concerns. The router's only job is classification — you can unit-test it with a labeled set and measure routing accuracy directly. Each handler sees a narrow, clean context. Anthropic's "Building effective agents" names this the routing workflow and recommends it precisely when inputs fall into distinct categories better handled separately.
Two implementations.
Classifier router. A dedicated cheap-model call (or an embedding + nearest-centroid, or even a fine-tuned classifier) maps the input to a route label. Deterministic-ish, fast, independently testable, observable. Preferred when routes are stable and enumerable.
Tool-call router. The orchestrating model is given each route as a "tool"; calling the tool is the routing decision. More flexible (the model can route mid-conversation with full context) but the decision is entangled with generation and harder to test in isolation.
# Classifier router with an explicit fallback route = classifier.label(query, allowed=ROUTES, confidence=True) if route.confidence < THRESHOLD: return handle("fallback", query) # general agent / ask to clarify handler = HANDLERS[route.label] # narrow tools, narrow prompt return handler.run(query)
Always define an explicit low-confidence fallback route. The most common production routing bug is the absence of one: an out-of-distribution query gets force-fit into the closest label and fails confidently inside the wrong handler. A fallback that asks for clarification or invokes a generalist is mandatory, not optional.
Fan-out and parallelism.
Routing's sibling is fan-out: instead of choosing one handler, dispatch to several concurrently and gather. Anthropic distinguishes two forms — sectioning (split a task into independent subtasks run in parallel, then aggregate) and voting (run the same task several times for diverse outputs, then reconcile).
# Sectioning fan-out: independent subtasks, parallel, then gather subtasks = planner.split(task) # must be genuinely independent results = parallel_map(lambda s: handle(route(s), s), subtasks) return aggregator.merge(task, results) # a real reconciliation step
Parallel fan-out is a pure latency/throughput win only when subtasks are truly independent. The aggregator is not a formality: merging partial results, resolving conflicts between branches, and deduplicating is itself a model task that fails if treated as string concatenation.
Failure modes of the routing layer.
- Misroute, then confident failure. Wrong label sends the query to a handler that lacks the right tools; it produces a fluent wrong answer. Mitigation: measure routing accuracy as a first-class metric; add a confidence threshold + fallback; let handlers signal "this isn't mine" to bounce back to the router (a re-route budget prevents ping-pong).
- Route proliferation. Routes multiply until the classifier itself is unreliable and overlapping. Mitigation: keep routes coarse and orthogonal; prefer a few broad handlers over many narrow ones; merge routes the classifier confuses.
- Boundary inputs. Queries that legitimately span two routes get arbitrarily assigned. Mitigation: allow multi-label routing into a fan-out, or escalate to a generalist for genuinely cross-cutting requests.
- False parallelism. Fanning out subtasks that are actually dependent yields branches that contradict each other and an aggregator that cannot reconcile them. Mitigation: only parallelize a verified independent set; everything with dependencies stays sequential.
A router is a single point of failure with leverage: every downstream handler is gated by its accuracy. It deserves its own labeled eval set and dashboard, the same scrutiny you would give an authentication check. An unmeasured router silently caps the quality of the entire system.
When to use it — and when it is overkill.
Use routing when: inputs fall into distinct categories that genuinely benefit from different prompts, tools, or models; the tool set is too large for one reliable agent; you want to send cheap queries to a cheap model and hard ones to a strong one (cost routing); or you are building toward multi-agent and need the dispatch primitive.
Skip it when: there are only a handful of tools a single agent selects reliably — a router adds a hop, a failure surface, and an eval burden for no gain. Routing is the first structural pattern to add as a single agent strains, but adding it before that strain is measurable is premature optimization with a real maintenance cost.
The honest tradeoff: routing buys clean context, testable dispatch, cost control, and a path to multi-agent — at the price of an extra hop and a high-leverage component that itself must be monitored. It is the connective tissue of the orchestration patterns in the next essay.