Playbooks / Coding & Computer-Use Agents
Coding & Computer-Use Agents
Agents that read code, write code, run tools, and drive a computer — patterns, harnesses, and pitfalls.
- Coding Agent ArchitectureThe localize-edit-verify loop that makes a coding agent more than a code generator: the agent-computer interface, why agentic beats pipeline coding, and where the loop fails.
- Repo Navigation & Code ContextCode search vs. embeddings, symbol-level indexing, context budgeting over a large tree, and why confident wrong localization is the expensive failure of code retrieval.
- Patch Generation & Test-Driven LoopsStructured diffs and hunk-apply failures, test-driven self-correction, regression guarding, and the three honest liars in the loop: flakes, overfit, and the deleted assertion.
- Computer-Use & GUI AgentsPixel vs. DOM grounding, the action space, the screenshot loop, and the multiplicative latency and reliability tax that makes GUI control a last resort.
- Sandboxing & Safe ExecutionContainerized execution, network and filesystem isolation, capability scoping, and designing for blast radius when an agent runs untrusted, attacker-influenced code.
- Evaluating Coding AgentsThe SWE-bench family, pass@k vs. resolve rate, harness sensitivity, documented contamination, and why a private post-cutoff eval set is the only number to trust.