Repo Navigation & Code Context

U2
Playbook · Coding & Computer-Use Agents

On a large repo the agent's hard problem is not generation — it is finding the twelve lines that matter.

A real codebase is millions of tokens; the context window is not. Every coding agent is therefore a retrieval system first and a generator second: it must navigate from a vague issue to the precise call site, hold just enough of the repo in working memory to be correct, and not poison its own context with the other 99.99%. This essay covers code search vs. embedding retrieval, symbol-level indexing, context budgeting over a large tree, and why retrieval for code is its own discipline.

STEP 1

Code is not prose; lexical search usually beats embeddings here.

Naive RAG on code — chunk files, embed, cosine-retrieve — underperforms on the operations agents actually need: "where is this symbol defined," "who calls this function," "what does this import resolve to." Those are exact queries, and grep/ripgrep plus a symbol index answer them precisely and verifiably, while embeddings answer them fuzzily. The strongest navigation stacks are hybrid: lexical and structural search for identifiers and call graphs, embeddings reserved for the genuinely semantic query ("the code that handles retry backoff") where the agent does not know the name yet.

STEP 2

Index symbols and the graph, not just text.

A flat text index cannot answer "find references." A symbol index built from a parser or LSP/ctags/tree-sitter gives the agent go-to-definition and find-references as primitive tools — the same affordances a human engineer uses to move through code. This converts navigation from "read files and hope" into graph traversal: from the issue's surface symbol, hop to its definition, then to its callers, narrowing the candidate edit set with each hop instead of flooding context with whole files.

# navigation as graph traversal, not file dumping
hits  = repo.grep(symbol)                 # lexical: exact, cheap, verifiable
defn  = repo.goto_def(hits[0])           # structural: one true site
calls = repo.refs(defn)                  # who depends on this?
ctx   = budget.select(defn, calls, k=12)  # keep ~12 spans, not 12 files

Prefer windowed views over whole files. An open(path, line, ±40) viewport keeps the agent on the relevant span and leaves budget for the call graph; dumping a 1500-line module spends the window on noise the agent must then re-skim every turn.

STEP 3

Context budgeting is an allocation problem with a hard ceiling.

Treat the window as a fixed budget split across four competing claimants: the task (issue + repro), the code under edit (the spans you will change), the evidence (callers, tests, traces that justify the change), and the scratchpad (the agent's own reasoning and history). Every token spent on an irrelevant file is a token stolen from evidence. The discipline is ruthless eviction: a file read to test a hypothesis that was refuted should be summarized to one line and dropped, not carried forever.

STEP 4

Retrieval for code has a precision problem the agent must police.

A symbol like handler or config matches hundreds of irrelevant sites; semantic retrieval returns plausible-but-wrong neighbors. Unfiltered, this is context poisoning — the model anchors on a confidently-retrieved wrong file and edits it. Mitigations: rank by structural proximity to the failing stack frame, not just match count; prefer the file the traceback names; and make the agent state its localization hypothesis and confirm it by reproducing the bug before editing, so a bad retrieval is caught by the oracle, not by the diff.

The most expensive failure in U2 is a confident wrong localization: the retrieved file looks right, the patch is clean, the targeted test even passes — and an untested call path elsewhere now breaks. High retrieval recall without structural grounding manufactures exactly this.

STEP 5

The repo map: a cheap, persistent skeleton beats re-exploring.

Re-discovering the project layout every task is pure waste. A repo map — a compact, ranked outline of top packages, key modules, and their public symbols (the technique Aider popularized) — gives the agent a persistent mental model for a few hundred tokens, so navigation starts from "I know roughly where the auth code lives" instead of from zero. It is regenerated on repo change, pinned in working memory across compaction, and is the single highest-leverage piece of non-task context.

STEP 6

When the cheap path is wrong.

Lexical-plus-symbol search dominates for well-structured, statically analyzable code with meaningful names. It degrades on dynamic dispatch, metaprogramming, codegen, monorepos with reused identifiers, and undocumented legacy where the name tells you nothing — there, embeddings and reading the tests as documentation earn their cost. Default to exact, structural navigation and add semantic retrieval only where the code's structure stops telling the truth; the failure mode of code RAG is not missing the answer, it is confidently retrieving the wrong one.