Incident Response & Runaway Containment

Operation · AgentOps: Deploy & Operate

Incident response and runaway containment: the loop that will not stop.

A crashed service stops doing damage; a runaway agent keeps acting — looping, spending, and writing to the world while you scramble. Agent incidents have a property ordinary outages do not: the system is the attacker's hands, and "do nothing" is not safe because the agent keeps going. This essay is the on-call reality: how you detect a runaway, what you slam to stop it, how you bound the blast radius, and what the trace must contain so the postmortem is possible.

STEP 1

An agent incident is an active adversary, not a passive failure.

The reflexes from service ops mislead here. A wedged web server fails closed — it stops serving and stops harming. A wedged agent fails open: it keeps iterating, keeps calling tools, keeps spending, possibly keeps writing to production, for the entire time it takes you to notice and react. The cost and the damage are integrated over your response latency. For agents, mean-time-to-contain dominates mean-time-to-diagnose — you stop the bleeding first, understand it second.

The runaway shapes you will actually see: a tight tool-call loop (same action, rising rate), a reflection spiral (re-planning forever, no state change), pathological fan-out (sub-agents spawning sub-agents), a budget-burn with no progress, and the dangerous one — a coherent agent doing the wrong thing confidently and fast (a bad prompt or a model regression at scale).

STEP 2

Detect from rate and progress, not from errors.

A runaway often throws zero errors — it is "succeeding" at the wrong thing. Error-rate dashboards stay green while you bleed. The signals that actually fire are derivatives and ratios:

Steps-per-run and tool-calls-per-minute crossing a hard ceiling — the loop-detection signal, and the same no-progress metric from cost-control-in-the-loop doing double duty as a safety alarm.
Spend-rate per tenant/fleet with an absolute alert, not just a daily-total — a runaway burns the month's budget in an hour, and the daily rollup tells you tomorrow.
Write-action rate and first-seen destinations — a spike in side-effecting calls, or actions hitting destinations never seen before, is a containment trigger regardless of "success."
Progress-to-cost ratio collapsing — lots of activity, no closed loops — is the cleanest single runaway signature.

Alert on absolute rate over a short window, not cumulative daily totals. A daily spend dashboard reports a runaway after it has already cost you the most. The useful alarm is "$X in the last 5 minutes," wired to the kill switch path — not an email.

STEP 3

Kill switches, layered by blast radius and reversibility.

One global off-switch is too blunt (it takes down every healthy tenant to stop one) and too slow to reach for. Build a graded ladder so the on-call can match the smallest sufficient hammer to the incident:

# containment/killswitch.py — checked before every tool call
def guard(run, tool, args):
    if flag("halt:global"):           # 4: stop the world
        raise Halted("global")
    if flag(f"halt:tenant:{run.tenant}"):  # 3: one tenant
        raise Halted("tenant")
    if flag(f"halt:run:{run.id}"):       # 2: one run
        raise Halted("run")
    if tool in frozen_tools():            # 1: disable a capability
        raise Halted(f"tool:{tool}")     # reads still flow
    return args

The properties that make a kill switch trustworthy under stress: it is checked in the loop before every effect (not a deploy, not a process kill that loses state), it is fail-closed (flag store unreachable ⇒ halt, do not "fail open and keep acting"), and capability-freeze (level 1) exists so you can neutralize the dangerous write tool while read-only diagnosis continues.

"Just redeploy" or "kill the pods" is not containment for a durable agent — a resumable run comes back on the next worker and continues exactly the harmful loop. Containment must be an in-loop, state-aware halt the resume path also respects, or you have built a runaway that survives its own kill.

STEP 4

Blast radius is bounded before the incident, not during it.

The most effective incident response is the limit that was already in place when it started. The structural bounds from earlier essays are the containment system, pre-installed: per-task cost ceilings (cost-control-in-the-loop) cap one run's damage; per-tenant concurrency caps (concurrency-and-scaling) stop one tenant becoming a fleet event; idempotent writes (idempotency-and-retries) mean a looping agent re-issuing the same write does bounded damage instead of N charges; least-privilege tool scoping bounds the worst any single call can do. An incident with no pre-installed bounds is unbounded by definition — on-call reaction time becomes the only limit, and that is never fast enough.

STEP 5

The runbook is a decision tree, executed under pressure.

At 3am, nobody reasons from first principles. The runbook must be a pre-decided sequence the on-call can execute without designing it live:

Contain first. Pick the smallest sufficient kill-switch level (Step 3). Stopping the bleed is not the same as understanding it; do not investigate a still-acting agent.
Assess scope. How many runs/tenants, what side effects already landed (read the side-effect ledger — the source of truth, not the chat logs), is value or only tokens at risk.
Decide reversibility. Identify writes that need compensating actions (refunds, retractions). The idempotent ledger tells you exactly what fired and once.
Communicate. Affected tenants, status, ETA — an agent acting wrongly on customer data is a trust incident, not just an ops one.
Recover deliberately. If a release caused it, roll back the pinned triple (rollout-and-versioning) before un-halting; resume runs only on a known-good contract.

STEP 6

The postmortem is only possible if the trace was captured first.

You cannot reconstruct an agent incident from metrics; you need the decision-level trace — every prompt, model output, tool call with arguments, and the release stamp, joinable by run_id to the side-effect ledger. The single rule: if it isn't in the trace, the postmortem is speculation, and every incident must produce a permanent regression test (the exact scenario, replayable, in the eval gate) so this class of runaway fails CI forever after. An incident that does not become a test is an incident you have agreed to have again. You cannot bolt observability on during an incident — the trace is captured before the incident or the postmortem is fiction; build the kill switch and the trace before you need either, because the day you need them is the day you cannot add them.