Sandboxing & Safe Execution

Playbook · Coding & Computer-Use Agents

A coding agent runs untrusted code it just wrote against an issue it just read — the sandbox is the only thing between that and your infrastructure.

The defining capability of a coding agent — executing code to verify it — is also its defining risk. The code is model-generated, often steered by a third-party issue or webpage, and the agent will run it with whatever it can reach. This essay covers containerized execution, filesystem and network isolation, capability scoping, and how to think about the blast radius of code you did not write and did not review.

STEP 1

The threat model: the code is untrusted and attacker-influenced.

Two compounding facts. First, model-generated code is untrusted by default — not malicious by intent, but unaudited and capable of rm -rf, exfiltration, or a fork bomb by accident. Second, the agent's instructions often come from attacker-controllable input — a GitHub issue, a scraped page, a dependency's README — making prompt injection a path to code execution, not just bad text. The sandbox must assume the code actively wants out, because sometimes something upstream did.

STEP 2

Isolation is layered: container, then network, then filesystem, then identity.

No single boundary is sufficient. A container (or microVM/gVisor for a stronger kernel boundary) bounds the process; an explicit egress policy bounds the network; a scoped, ephemeral filesystem bounds what code can read or destroy; and — the one teams forget — no ambient credentials bounds what a breakout is even worth. A container with the host's cloud metadata endpoint reachable and a credential on disk is not a sandbox; it is a staging area.

# the execution boundary, declared not assumed
sbx = Sandbox(
    net   = "deny",            # default-deny egress, allowlist only
    fs    = "ephemeral:/work",  # scoped, wiped per task
    creds = None,             # no ambient cloud/git identity
    limits= dict(cpu=2, mem="4g", pids=512, wall="10m"))
sbx.run(agent_generated_cmd)   # blast radius == this box, this task

The most common real-world breakout is not a kernel exploit — it is a reachable cloud metadata endpoint, a mounted host socket (/var/run/docker.sock is game over), or a long-lived token on the PATH. Audit what is reachable, not just what is "inside."

STEP 3

Network is the exfiltration channel; default-deny egress is non-negotiable.

An agent that can reach the internet can leak whatever it can read — source, secrets, the issue's stolen contents — and can pull a second-stage payload that prompt injection asked for. Inbound isolation is the easy half; egress is the half that matters and the half most setups leave open for "just pip install." The disciplined posture is default-deny with a narrow allowlist (the package mirror, the git remote) and logging on every allowed connection, so a dependency-install need does not become an open door.

STEP 4

Capability scoping: grant per-task, least-privilege, time-boxed.

Static, broad permissions are the root cause of large blast radius. Scope every capability to the task: a token that can read this repo and nothing else, write access only to the working tree, a wall-clock and resource ceiling that bounds a fork bomb or a crypto-miner to noise. The question is never “is this action safe” in the abstract; it is “what is the worst this credential set can do if the code is hostile,” and the answer should be small and recoverable by construction.

STEP 5

Blast radius is the design metric, not "did it escape."

Assume the boundary will eventually be crossed and design so that crossing it is cheap to absorb: ephemeral environments destroyed per task (no persistence for an implant to survive in), no shared state between tasks (one poisoned run cannot taint the next), no path from the sandbox to production or to another customer's data. A breakout into a box with nothing worth stealing and nothing to pivot to is an incident report, not a breach. Pair this with execution logging so you can answer "what did it actually do" after the fact.

STEP 6

The honest tradeoff.

Stronger isolation costs latency, engineering, and capability: a truly airgapped sandbox cannot run the integration test that needs a real database, and a microVM per task is slower than a thread. The point is not maximum isolation everywhere — it is isolation matched to what the code can touch. Never run agent-generated code with credentials or reachability you would not hand a stranger; the only safe assumption is that the code is hostile and the issue that wrote it was too — engineer for the blast radius, not for the hope it behaves.