Playbooks / Coding & Computer-Use Agents

Coding & Computer-Use Agents

Agents that read code, write code, run tools, and drive a computer — patterns, harnesses, and pitfalls.

  1. Coding Agent Architecture
    The localize-edit-verify loop that makes a coding agent more than a code generator: the agent-computer interface, why agentic beats pipeline coding, and where the loop fails.
  2. Repo Navigation & Code Context
    Code search vs. embeddings, symbol-level indexing, context budgeting over a large tree, and why confident wrong localization is the expensive failure of code retrieval.
  3. Patch Generation & Test-Driven Loops
    Structured diffs and hunk-apply failures, test-driven self-correction, regression guarding, and the three honest liars in the loop: flakes, overfit, and the deleted assertion.
  4. Computer-Use & GUI Agents
    Pixel vs. DOM grounding, the action space, the screenshot loop, and the multiplicative latency and reliability tax that makes GUI control a last resort.
  5. Sandboxing & Safe Execution
    Containerized execution, network and filesystem isolation, capability scoping, and designing for blast radius when an agent runs untrusted, attacker-influenced code.
  6. Evaluating Coding Agents
    The SWE-bench family, pass@k vs. resolve rate, harness sensitivity, documented contamination, and why a private post-cutoff eval set is the only number to trust.