ReAct — interleaving reasoning and acting.
ReAct is the workhorse of production agents: a loop where the model alternates between thinking and calling tools, each tool result reshaping the next thought. This essay covers the actual control flow, why interleaving beats reason-then-act, and the failure modes that bite at scale.
The core idea: thought and action share one trajectory.
ReAct, from Yao et al. (2022), synergizes reasoning (chain-of-thought) and acting (tool calls / environment interaction) in a single interleaved stream. The model emits a thought, then an action, observes the result, emits the next thought conditioned on that observation, and repeats until it decides to answer. The key insight is that reasoning and acting reinforce each other: reasoning decides which action to take and how to interpret messy results; actions inject external facts that correct the reasoning before it compounds into a hallucination.
Contrast the two failure modes ReAct sits between. Pure chain-of-thought reasons fluently but is untethered — it confidently invents facts. Pure act-only loops (call a tool, call another) ground every step in reality but cannot decompose a hard question or recover from an irrelevant result. Interleaving gets grounding and deliberation in one trajectory.
The loop, explicitly.
With modern tool-calling APIs you do not prompt the literal strings "Thought:"/"Action:" — the model's reasoning lives in its text content and the action is a structured tool call. The structure is identical:
# The ReAct loop — framework-free history = [user_query] for step in range(MAX_STEPS): resp = model.call(history, tools=TOOLS) # thought + maybe action history.append(resp) if resp.stop_reason == "final_answer": return resp.text # model chose to stop for call in resp.tool_calls: # act obs = run_tool(call.name, call.args) history.append(tool_result(call.id, obs)) # observe # loop: next thought is conditioned on obs raise StepBudgetExceeded(history) # bounded by construction
Three properties make this production-grade. It is bounded (the step budget caps cost and runaway loops). It is observable (every thought and observation is in history — your trace is the data structure itself). And it is resumable if you persist history between steps.
The single highest-leverage instrumentation: log history after every step with the step index. Ninety percent of ReAct debugging is reading the trajectory and finding the step where the thought stopped being grounded in the last observation.
When ReAct pays off.
- Open-ended information tasks where the next action genuinely depends on the last result: research, debugging, multi-hop QA, customer-support triage. If you cannot write the plan in advance, ReAct's per-step adaptivity is exactly the right tool.
- Noisy or partial tool results where the model must reason about whether a result is relevant before deciding what to do next.
- Small-to-moderate tool sets (roughly 3–15 tools) where the model can reliably select among them per step.
Failure modes — and the mitigations that actually work.
ReAct's adaptivity is also its liability. The named failure modes, in rough order of how often they ship to production:
- Looping / thrashing. The agent re-runs near-identical actions because each observation fails to advance the goal. Mitigation: a step budget (always), plus a cheap loop detector that flags repeated (tool, normalized-args) tuples and injects a "you have tried this; change approach or stop" message.
- Context rot on long trajectories. The history grows until early grounding facts are buried or evicted, and the model drifts. Mitigation: periodic context compaction — summarize old steps into a durable scratchpad, keep recent steps verbatim.
- Premature commitment. The model decides it has the answer after one weak observation. Mitigation: a reflection or verifier gate before
final_answer(covered in its own essay) — do not solve this inside the ReAct prompt alone. - Tool-selection errors at scale. With many similar tools the model picks the wrong one. Mitigation: a routing layer in front of the loop (its own essay) rather than more prose in the system prompt.
Anti-pattern: "fix it in the system prompt." Each failure above has a structural mitigation. Piling instructions ("don't repeat actions, verify before answering, pick the right tool") into the prompt produces a brittle wall of text that the model follows inconsistently and that you cannot test. Prefer a loop-detector, a verifier node, and a router — code you can unit-test.
When NOT to use plain ReAct.
ReAct is the wrong default when the task has a known, stable structure. If you already know the five steps to onboard a customer, per-step re-deliberation is wasted latency and a chance to go off-script — use plan-and-execute or a hard-coded workflow with model calls at the leaves. ReAct also degrades on long-horizon tasks: with no global plan it loses the thread after ~15–25 steps. And for tasks with verifiable answers (code, math), a single ReAct pass underperforms a search or evaluator-optimizer loop that exploits the cheap correctness signal.
The honest one-line tradeoff: ReAct buys maximal adaptivity at the cost of global coherence and predictable cost. The patterns in the following essays each reclaim some coherence by reintroducing structure ReAct deliberately omits.