Shared Memory & the Blackboard

G5
Deep Dive · Multi-Agent Systems

Shared memory and the blackboard: a database problem wearing an AI hat.

When agents need to share state without an O(n²) mesh of messages, the answer is a shared workspace: a blackboard. The pattern is old — Hayes-Roth, 1985 — and 2025 brought it back as a token-efficient coordination substrate for LLM agents that can rival explicit orchestration. But the moment multiple agents read and write one store, you have inherited every classic concurrency hazard: write contention, stale reads, and lost updates. This essay covers the architecture and the consistency discipline that keeps it from quietly corrupting your run.

STEP 1

The blackboard replaces N² messages with one shared artifact.

In a blackboard system, agents do not message each other. They read a globally visible workspace, do work, and post results back to it; a control component decides who acts next. This collapses mesh communication to O(n) reads/writes against one store and gives you a single place to observe the whole system's state. 2025 work on blackboard-based LLM multi-agent systems shows this can match or beat both static and dynamically orchestrated designs while staying token-efficient — the blackboard is the coordination, so you stop paying for repeated point-to-point context exchange.

# blackboard: read shared state, act, post back
def step(agent, bb):
    state = bb.read()                 # sees the whole board
    result = agent.run(state)
    bb.post(agent.id, result)        # new entry, not a broadcast
STEP 2

A scratchpad is not a blackboard, and conflating them causes the bugs.

Two distinct constructs get the same name. A scratchpad is usually one agent's private working notes — append-only, single-writer, no contention. A blackboard is multi-reader, multi-writer shared state — and that is a concurrent database, not a notebook. The failures in this essay all come from treating a true blackboard as if it were a private scratchpad: no write policy, no versioning, last-write-wins by accident. Decide explicitly which one you have; only the multi-writer case needs the rest of this essay.

STEP 3

Write contention: concurrent posts silently lose updates.

If two agents read the board, each computes from that snapshot, and each writes back, the second write can clobber the first — a classic lost update. With LLM agents this is worse than in normal systems because the loss is silent and semantic: there is no exception, just an answer that quietly ignored a worker's contribution, and no stack trace to find it. The fixes are the standard database ones: serialize writes through an orchestrator or a lock, use optimistic concurrency (compare-and-set on a version), or make the board append-only so writes never overwrite — they accumulate, and a reader reconciles.

Prefer an append-only, versioned log over a mutable key-value board. Append-only structurally eliminates lost updates (nothing is overwritten), gives you a perfect audit trail for cross-agent debugging, and turns "what did the team know at step k" into a replayable query instead of a forensic guess.

STEP 4

Stale reads: an agent acts on a board that already moved.

Even with safe writes, a reader can act on a snapshot that is already obsolete by the time it posts — its expensive reasoning was conditioned on superseded state. Unbounded, this produces work that contradicts decisions made while the agent was thinking, and a board that oscillates instead of converging. Mitigations: version every read and reject a post whose base version is stale (forcing a re-read), scope agents to disjoint regions of the board so their reads rarely invalidate, or have the control component only schedule an agent when its inputs are settled. Choose the consistency model deliberately — eventual consistency on an agent blackboard is a decision, not a default you back into.

"It works in the demo" almost always means the demo ran agents effectively sequentially, so contention and staleness never fired. The first time two agents truly overlap in production, lost updates and stale reads appear together — and because both are silent, you will see a wrong final answer with a clean log and no idea why.

STEP 5

The blackboard is a bottleneck and a blast-radius amplifier.

Centralizing state centralizes risk. Every agent reads the whole board, so the board's size inflates every agent's context and cost as the run grows; the board is a single point of failure and a serialization point; and one agent posting a confidently wrong entry pollutes the shared state that every other agent then reads — a uniquely fast error-propagation path (see multi-agent-failure-modes). Counter it: bound and summarize the board so context stays finite, scope reads to relevant regions, attribute and version every entry so a bad post is traceable and revocable, and checkpoint the board so a corrupt state can be rolled back rather than reasoned around.

STEP 6

When NOT to use a blackboard.

Skip it when agents do not actually need shared state (a star with scoped workers is simpler and has none of these hazards), when the board would grow unbounded and blow every context budget, or when you are unwilling to specify a concurrency model — because an unspecified one is "lost updates and stale reads in production." A blackboard earns its place only with a real write policy, versioned reads, attribution, and bounded size. A blackboard without a concurrency discipline is not shared memory — it is a shared race condition that returns a confident answer.