The agent design-pattern landscape.
Before you pick ReAct or a supervisor graph, understand what an architecture actually buys you: a control structure over a stochastic policy. The pattern is the part you can reason about; the model is the part you cannot. This essay maps the territory and gives you the axes to compare every pattern that follows.
An architecture is a control structure over a stochastic policy.
A single LLM call is a function from context to a probability distribution over tokens. It has no memory, no ability to act, and no guarantee of correctness. An agent architecture is the deterministic scaffolding you wrap around that call: the loop that decides when to call the model again, what to put in its context, which tools it may invoke, and when to stop. Everything interesting about reliability lives in the scaffolding, not in the prompt.
This framing matters because it tells you where to spend effort. You cannot make a model deterministic, but you can make the control flow deterministic, observable, and bounded. A good architecture converts an unbounded, unpredictable generation process into a bounded state machine where each transition is inspectable. Patterns differ in how much structure they impose and where they impose it.
Mental shortcut: ask "if this run goes wrong, what is the smallest unit I can inspect and retry?" In a single mega-prompt the answer is "the whole thing." In a well-structured architecture it is "one node." That delta is the entire value of design patterns.
The five axes that distinguish every pattern.
Every architecture in this section can be located on five axes. Learn these once and the rest of the deep-dives become comparisons rather than a list of tricks.
- Planning horizon. Does the agent decide one step at a time (ReAct), or commit to a multi-step plan up front (plan-and-execute)? Short horizons adapt to surprises; long horizons stay coherent on multi-stage tasks.
- Self-correction. Is there an explicit verify-and-revise step (reflection, evaluator-optimizer), or does the agent trust its first output? Self-correction trades latency and tokens for quality on tasks with checkable structure.
- Branching. Does the agent pursue one trajectory, or explore several and select (tree/graph search, best-of-N)? Branching buys robustness on hard reasoning at multiplicative cost.
- Decomposition topology. One agent with many tools, or many agents with narrow scopes coordinated by a router or supervisor? Decomposition isolates context and failure but adds hand-off overhead.
- State ownership. Who holds the ground truth — a linear message history, an external scratchpad/blackboard, or a typed graph state? This determines how you debug, resume, and audit.
Most production systems are not a single pattern but a composition: a router that dispatches to a plan-and-execute worker whose executor uses a ReAct tool loop with a reflection gate before returning. The axes let you describe that composition precisely instead of waving at "an agent."
Why architecture is a reliability lever, not a capability lever.
A frequent mistake is treating architectures as ways to make a model "smarter." They are not. The model's capability ceiling is fixed at inference time. What architecture changes is the distribution of outcomes: it cuts the tail of catastrophic failures and raises the floor of mediocre runs, usually at the cost of latency, tokens, and engineering complexity.
Concretely, the same base model on the same task:
# Illustrative — pass rate on a 200-task internal eval single call (no tools) 41% + tool loop (ReAct-style) 68% + reflection gate before return 74% + best-of-3 with selector 79% # ~3x token cost + task-specific verifier 88% # biggest single jump
Two lessons fall out of numbers like these. First, the early structure (giving the model tools and a loop) is the cheapest, largest gain — that is why the next essay starts with ReAct. Second, generic self-improvement loops show diminishing returns; the jump from a task-specific verifier dwarfs generic reflection because checkable correctness beats the model grading itself.
The most expensive anti-pattern is adding branching and reflection to a task whose failures are not detectable. If the agent cannot tell a good answer from a bad one, best-of-N just samples more bad answers and the selector picks confidently among them. Architecture amplifies a signal you have; it does not create one.
How to read the rest of this section.
The essays that follow are ordered from least to most structure. ReAct is the minimal viable loop. Plan-and-execute adds a planning horizon. Reflection adds self-correction. Search strategies add branching. Routing and multi-agent orchestration add decomposition. Tool error recovery is the cross-cutting concern that determines whether any of them survive contact with real tools.
For each, hold three questions in mind, because the acceptance bar for this section is that every pattern states all three: When does it pay off? When does it actively hurt? What concrete tradeoff are you buying? A pattern with no failure mode is a pattern you do not yet understand.
Default recommendation for a new system: start with the simplest pattern that could possibly work (usually a single ReAct agent with 3–6 tools), instrument it heavily, and let observed failure modes — not architecture diagrams — pull you toward more structure. Every essay here is a response to a specific failure you should be able to see in your traces first.