Agents vs Chatbots vs Workflows

Concepts · Agentic AI Explained

Agents vs chatbots vs workflows vs pipelines.

"We built an AI agent" is, more often than not, false — and the gap matters because each of these four things has a different cost, a different failure mode, and a different debugging story. This entry gives you a single discriminating question that sorts any LLM system into the right bucket, four concrete examples doing the same task at each level, and a decision guide for picking the simplest thing that works.

STEP 1

The one question that sorts them.

Four systems, increasing in self-direction. The discriminating question, asked of the running system, is: "At each step, who decides what happens next?"

Chatbot. A human decides what happens next, every turn, by typing the next message. The model only produces text. No tools fire on their own. Each turn is independent reasoning over the conversation; the loop is the human pressing Enter.
Pipeline. A fixed, linear sequence of steps decides what happens next, and the order never changes. extract → translate → summarize → store. The LLM may power one or more steps, but it does not choose the sequence; the code does, identically every run.
Workflow. Predefined code decides what happens next, including branches and loops, but the decision logic is written by a human in advance. "If the classifier says 'refund', call the refund tool; else escalate." An LLM may sit inside a branch, but the branching itself is hardcoded — the model fills in the blanks, it does not redraw the flowchart.
Agent. The model decides what happens next, at each step, based on what it just observed, and that decision was not enumerated in advance by a human. The loop is driven by the model's own choices, not a script.

This is exactly the distinction Anthropic draws in Building Effective Agents: workflows orchestrate LLMs and tools through predefined code paths; agents let the LLM dynamically direct its own process. Pipelines are the degenerate case of a workflow with no branches; chatbots are the degenerate case with no tools and a human as the loop.

STEP 2

The same task, done four ways.

Task: a customer emails "my order #4471 never arrived, I want a refund." Watch how the four systems handle the identical input.

CHATBOT
  Agent reads email → drafts a reply suggesting the human
  check the order and decide. A support rep does everything.
  The "AI" never touches the order system.

PIPELINE
  classify(email) → extract(order_id) → lookup(4471)
  → draft_reply(template) → queue_for_human
  Same four steps, same order, every email, forever.
  No decision is made about WHICH steps to run.

WORKFLOW
  classify(email) = "refund_request"
  IF order is >30 days late AND amount < $50:
      auto_refund(4471); send(approved_template)
  ELSE:
      escalate_to_human()
  A human wrote that IF. The model only filled "refund_request".

AGENT
  Model: "I need the order status." → lookup(4471)
  Obs: "shipped, stuck in transit 19 days, carrier exception."
  Model: "Policy lets me refund late shipments under $50.
          This is $34. But I should check it wasn't already
          refunded." → check_refunds(4471)
  Obs: "no prior refund."
  Model: "Refund is appropriate." → issue_refund(4471)
          → compose a specific reply citing the carrier issue.
  Nobody scripted that sequence of checks; the model chose it.

The agent's sequence — check status, then notice it should also check for a double-refund, then act — was not in anyone's code. That is the capability you are paying for, and the property that makes agents harder to test: the workflow does the same thing on every input by construction; the agent's path is input-dependent and partly unpredictable.

STEP 3

What each one costs you (in the columns that aren't price).

The interesting cost of moving up this ladder is not tokens — it is testability, predictability, and debuggability. Internalize this table; it is the entire argument for "use the simplest thing that works."

Predictability. Pipeline: total — same steps every time. Workflow: high — finite, enumerable branches you can list. Agent: low — the path depends on observations, and observations depend on the environment, which changes.
Testability. Pipeline/workflow: you can write a test per branch and get coverage. Agent: there is no finite set of paths to enumerate; you test with evaluation suites over many scenarios and accept a success rate, not a guarantee.
Debuggability. Workflow failure: a known branch did the wrong thing — you find it and fix the code. Agent failure: the model made a judgment you disagree with on turn 7 of a 12-turn trace — you debug a behavior, not a line.
Failure mode. Pipeline fails predictably and loudly (a step throws). Agent fails plausibly and quietly (it does something reasonable-looking that's wrong, and continues). Quiet wrong is far more expensive than loud wrong.
Cost variance. A pipeline costs the same every run. An agent's cost is unbounded by construction unless you bound it — same input can take 2 turns or 20. (This is why the agent-loop entry hammered on step budgets.)

The pattern to internalize: agents trade predictability for adaptability. You get a system that handles cases nobody anticipated — and you give up the ability to know in advance exactly what it will do. That trade is excellent for genuinely open-ended tasks and a terrible deal for tasks a workflow handles fine. Most "agent" projects that fail in production were workflows that didn't need the adaptability and couldn't afford the unpredictability.

STEP 4

The decision guide: climb only when forced.

Default to the bottom of the ladder and only climb when the task forces you. Ask these in order; the first "yes" is your answer:

Can a fixed sequence of steps do it? → Pipeline. "Transcribe the call, summarize it, file the summary." The order never needs to change. Don't add intelligence to step selection that isn't needed.
Are the decision points finite and known in advance? → Workflow. "Route the ticket to billing, tech, or sales based on content." You can list every branch. A human-authored router around LLM steps is more reliable, cheaper, and more testable than an agent here.
Is the conversation the product, with the human steering each turn? → Chatbot. A Q&A assistant, a writing partner. Adding autonomous tool use to something the user wants to drive turn-by-turn usually makes it worse, not better.
Does the task require deciding the sequence of actions at runtime, based on results you cannot enumerate in advance, in an environment that varies? → Agent. Debugging an unfamiliar codebase, multi-step research where each finding determines the next query, operating a system whose state you don't control. Here the adaptability is the point and a workflow genuinely cannot express it.

The honest summary, and the through-line for the rest of this section: an agent is the most powerful and the most expensive option on every axis except raw capability ceiling — predictability, testability, cost variance, blast radius. It is the right tool when the task is genuinely open-ended, and an act of self-harm when a workflow would have done. The next entries — tools and environments, when to use an agent, the risks — assume you've made this choice deliberately, not by default or by hype.