Why Interop Matters: The M×N Problem

P1
Deep Dive · Protocols & Interop

Why interop matters: the M×N integration problem.

Agents are only as useful as the systems they can reach. This essay frames the combinatorial cost of connecting M agents to N tools and data sources by hand, and explains why a protocol layer — not more glue code — is the structural fix the ecosystem converged on through 2024–2025.

STEP 1

An agent is a model plus everything it can touch.

A language model on its own is a closed function: text in, text out. It becomes an agent when it can call tools, read external resources, and act on systems it does not itself contain — a database, a ticketing API, a filesystem, another agent. The model's reasoning is the easy part to share; a single API call swaps Claude for GPT for an open model. The hard part is the integration surface: every tool the agent can invoke, every data source it can read, every downstream service it can delegate to.

That integration surface does not transfer between agents for free. If you build a deep integration between your customer-support agent and your CRM, none of that work helps your data-analysis agent talk to the same CRM. Each agent re-implements connection handling, authentication, schema description, error semantics, and result shaping. The model is fungible; the plumbing is not.

This essay is conceptual scaffolding for the rest of the Protocols track. Later essays go concrete on tool-calling schemas, the Model Context Protocol (MCP), and agent-to-agent communication. Read this one first to understand why those protocols exist.

STEP 2

The combinatorics: M agents × N integrations.

Consider an organization with M distinct agents (support, sales analytics, code review, ops runbook) and N systems each might need (CRM, data warehouse, Git host, observability stack, internal wikis). In the naive world, every agent integrates with every system it needs independently. The integration count grows as the product M × N, not the sum M + N.

With 4 agents and 6 systems, that is potentially 24 bespoke integrations, each with its own auth flow, its own hand-written tool descriptions, its own brittle response parsing. Add a fifth agent and you owe up to 6 more. Upgrade a system's API and you re-test every agent that touched it. This is the same trap that motivated the Language Server Protocol (LSP) in editor tooling: M editors times N languages collapsed into M + N once a shared protocol existed.

# Naive coupling: every agent owns every integration
support_agent     -> crm_client_v1, warehouse_client, wiki_scraper
analytics_agent   -> crm_client_v2, warehouse_client, dashboards_api
codereview_agent  -> git_client, ci_client, wiki_scraper
ops_agent         -> observability_client, git_client, runbook_store
# 4 agents, ~10 reimplemented clients, drifting versions (crm v1 vs v2)

The cost is not only the initial build. It is the drift: the support agent's CRM client and the analytics agent's CRM client diverge, encode different assumptions, and fail differently. There is no single place to fix a bug, add a capability, or audit what an agent is allowed to do.

STEP 3

A protocol turns M×N into M+N.

The structural fix is a layer of indirection: a shared protocol that both sides speak. Each system is wrapped once as a protocol-speaking server. Each agent implements the protocol client once. Now any conformant agent can reach any conformant system without bespoke code between them. Integrations grow as M + N: N servers to wrap each system, M clients to teach each agent the protocol.

Before:  agent_i  ──bespoke──>  system_j        cost ≈ M × N
After:   agent_i  ──client──>  PROTOCOL  ──server──>  system_j
                                                      cost ≈ M + N

This is exactly the bet behind the Model Context Protocol (introduced by Anthropic in late 2024) for the agent-to-tool/resource edge, and behind agent-to-agent protocols such as Google's Agent2Agent (announced 2025) for the agent-to-agent edge. Both replace point-to-point glue with a negotiated, versioned interface. The protocol does not make the integration logic disappear — it relocates it to one reusable place per system and standardises how capabilities are described, invoked, and streamed back.

Three properties make a protocol layer worth the indirection:

  • Reuse. A server written for one agent works for every conformant agent — including agents that did not exist when the server was built.
  • Discoverability. Agents can ask a server what can you do? at runtime rather than having capabilities hard-coded at build time. Capability discovery gets its own essay later in this track.
  • Uniform semantics. One error model, one streaming model, one authentication story — so the agent's loop does not branch per integration.
STEP 4

What a protocol does not solve.

Indirection is not free, and a protocol is not magic. Three honest caveats frame the rest of this track:

Semantics still leak. A protocol standardises the envelope — how a tool is described, called, and returned — not the meaning of the tool. Two CRM servers can both be protocol-conformant and still expose wildly different tool vocabularies. Interop at the transport and schema layer does not guarantee interop at the task layer.

The trust boundary moves, it does not vanish. Wrapping a system as a server means an agent can now drive that system through a uniform channel. That uniformity is a security surface: a connected server can return content that influences the model (a prompt-injection vector), and a compromised server is a confused-deputy risk. This track only cross-links the issue; the dedicated treatment of protocol security, scoping, and trust lives in the Safety & Agentic Security deep-dives.

"It speaks the protocol" is a statement about shape, not safety. Never treat protocol conformance as authorization. Capability scoping, least privilege, and content provenance are separate, mandatory concerns covered in the security track.

Versioning is forever. Once M clients and N servers share an interface, that interface must evolve without a flag day. Mature protocols build in version negotiation and capability flags from day one precisely so the M+N graph can upgrade incrementally rather than all at once.

The throughline for this track

Every essay that follows is an answer to a specific sub-question of "how do we make the protocol layer real": how tools describe themselves with JSON Schema (tool-calling standards), how the agent-to-resource edge is structured (MCP architecture), how agents delegate to other agents (A2A communication), how typed inputs and outputs are validated (structured tool I/O), how a client learns what a server offers (capability discovery), and how all of this is assembled into one agent that integrates cleanly (building interoperable agents). The unifying claim is the one in this essay: prefer a protocol over M × N hand-written bridges.