Building an Interoperable Agent

P7
Deep Dive · Protocols & Interop

Building an interoperable agent: choosing and composing the protocols.

This essay synthesises the track. It compares the approaches — raw provider tool calling, MCP, and agent-to-agent protocols like A2A — on what each is actually for, where they overlap and where they do not, and gives a practical decision rule plus an integration architecture for an agent that has to talk to everything.

STEP 1

Three layers, three jobs — not three competitors.

The most common confusion is treating tool calling, MCP, and A2A as alternatives. They operate at different layers of the same stack:

  • Provider tool calling is the model-to-function contract: how a single model emits a structured call against a JSON Schema and gets a result back. It is intra-process, provider-shaped, and the substrate everything else builds on.
  • MCP is the agent-to-tool/resource contract: how a host reaches reusable, independently-built tool/resource/prompt servers over a negotiated JSON-RPC session. It standardises and externalises the integration so it is not re-authored per agent.
  • A2A is the agent-to-agent contract: how an agent delegates opaque, possibly long-running, multi-turn work to a peer agent it does not own.
user
 └─ Host / Agent
      ├─ provider tool calling  → model ⇄ functions   (in-process)
      ├─ MCP client(s)          → tools/resources       (your systems)
      └─ A2A client             → peer agents           (others' agents)
   the same program is often an MCP host AND an A2A server

They compose vertically. A typical production agent uses provider tool calling as its inner mechanism, MCP to source most of its tools and data without bespoke clients, and A2A when part of the job belongs to an agent someone else operates.

STEP 2

Overlap, gaps, and a decision rule.

There is genuine overlap at one seam: "call another agent" can be modelled as a single provider tool, as an MCP tool, or as a true A2A task. They are not equivalent — they differ in what they can express:

  • As a provider tool / MCP tool: synchronous, single-shot, string-or-JSON in and out. Cheapest. Correct when the peer is effectively a stateless function and the call finishes within a request.
  • As an A2A task: long-running, multi-turn (the peer can ask you questions), typed artifacts, opaque internal state. Necessary when the work outlives one request, needs clarification, or returns rich outputs.

A practical decision rule:

# Pick the thinnest layer that expresses the interaction.
if capability lives in your process and is provider-shaped:
    use provider tool calling directly
elif it wraps a system, is reusable, may serve other agents:
    use MCP (write/adopt a server once)
elif it is another team's agent, or work is long /
     multi-turn / richly-typed:
    use A2A (task, messages, artifacts)
else:
    # do not invent a protocol; reuse one of the above

The honest gaps: none of these standardises semantics. Two MCP servers can both be conformant and expose incompatible tool vocabularies; an Agent Card tells you a peer's skills exist, not that they mean what you assume. Interop at the envelope layer does not buy interop at the task layer — that is still integration work, just integration work you do not have to re-plumb.

STEP 3

A reference integration architecture.

Pulling the track together into one shape. The agent has a single internal tool registry; every source — local functions, MCP servers, A2A peers — is normalised into it behind one interface, so the agent loop never branches per integration.

# One registry; sources normalised behind one shape
class Capability:
    name: str
    description: str          # model-facing; highest leverage
    schema: dict              # interoperable JSON Schema core
    invoke: Callable          # hides local | MCP | A2A

registry = []
registry += [wrap(fn)          for fn in LOCAL_TOOLS]
registry += mcp_client.tools_list()        # discovered
registry += a2a.skills_as_tools(agent_card) # discovered

# Agent loop sees ONE uniform list, renders it per
# provider with the thin adapter from essay P2.
tools = [provider_render(c) for c in registry]

This architecture is the M+N argument made concrete: each system is wrapped once (as a local handler, an MCP server, or an A2A peer), discovered at runtime, and presented to the model through a single normalised registry and the thin provider adapter from the tool-calling-standards essay. Adding a system adds a source, not a branch in the loop. Swapping the model swaps only the renderer.

STEP 4

What you still owe after the protocol.

Protocols deliver shape, discovery, and a uniform loop. They do not deliver correctness, safety, or trust — those remain your responsibility and sit on top of everything above.

  • Validation. Re-validate inputs (structure then semantics/authorization) at the boundary you own, and design outputs for consistency, economy, and actionable errors — regardless of any provider strict mode. See structured tool I/O.
  • Policy over discovery. Discovery lists what is possible; you decide which servers/peers to connect, which capabilities to surface, and which require consent. "It was advertised" is not authorization. See capability discovery.
  • Trust boundaries. Every connected server and every delegated peer can return content that steers your model and acts under granted authority. The threat model — prompt injection via tool output, confused-deputy, scoping, provenance, human-in-the-loop on high-impact actions — is developed in the Safety & Agentic Security deep-dives. This track deliberately defers the depth and only marks the seams.

The track in one sentence: prefer a versioned, discoverable protocol over M × N hand-written bridges; pick the thinnest layer (provider tool calling → MCP → A2A) that expresses the interaction; normalise every source into one registry behind one interface; and remember that the protocol gives you the envelope while validation, policy, and trust remain yours to enforce.

Built this way, an agent gains new tools, data, and peer agents by adding a source rather than rewriting its loop, and changes its underlying model by swapping a renderer rather than its integrations — which was the entire point of having a protocol layer in the first place.