Playbooks / Coding & Computer-Use Agents

Coding & Computer-Use Agents

Agents that read code, write code, run tools, and drive a computer — patterns, harnesses, and pitfalls.

Coding Agent Architecture

The localize-edit-verify loop that makes a coding agent more than a code generator: the agent-computer interface, why agentic beats pipeline coding, and where the loop fails.
Repo Navigation & Code Context

Code search vs. embeddings, symbol-level indexing, context budgeting over a large tree, and why confident wrong localization is the expensive failure of code retrieval.
Patch Generation & Test-Driven Loops

Structured diffs and hunk-apply failures, test-driven self-correction, regression guarding, and the three honest liars in the loop: flakes, overfit, and the deleted assertion.
Computer-Use & GUI Agents

Pixel vs. DOM grounding, the action space, the screenshot loop, and the multiplicative latency and reliability tax that makes GUI control a last resort.
Sandboxing & Safe Execution

Containerized execution, network and filesystem isolation, capability scoping, and designing for blast radius when an agent runs untrusted, attacker-influenced code.
Evaluating Coding Agents

The SWE-bench family, pass@k vs. resolve rate, harness sensitivity, documented contamination, and why a private post-cutoff eval set is the only number to trust.