Embeddings: Meaning as Geometry

F6
Concepts · AI Foundations

Embeddings: meaning as geometry.

Embeddings are the idea that quietly powers semantic search, recommendations, retrieval-augmented generation, and the inside of every LLM. The concept is one sentence: turn things into points in space so that "similar" becomes "close." This entry builds that intuition concretely, shows the famous word-arithmetic example, and explains exactly which problems embeddings solve and which they do not.

STEP 1

The core move: meaning becomes coordinates.

A computer cannot compare the meaning of "dog" and "puppy" — they are just different character strings, no more related than "dog" and "xylophone." An embedding fixes this by mapping each item to a list of numbers — a vector — chosen so that things with similar meaning land at nearby points and unrelated things land far apart. The vector is a point in a high-dimensional space (often hundreds or thousands of dimensions); "meaning" becomes a location, and "similarity" becomes geometric distance.

The number list is not designed by hand. A neural network learns it from data, under pressure to make useful predictions, and the by-product is a space where geometry reflects meaning. A 2-D sketch makes the idea visible even though real embeddings have far more dimensions:

      cat ·   · kitten
         ·  dog
            · puppy
                                  · car
                                · truck   · van

  near = similar meaning · far = unrelated · clusters = topics

Words about pets cluster together; words about vehicles form a separate cluster far away. Nobody placed them — training shaped the geometry.

STEP 2

Why "close" is so useful: similarity becomes arithmetic.

Once meaning is geometry, hard problems become easy distance calculations. "Find documents about this question" stops being keyword matching (which fails when the document says "automobile" and the query says "car") and becomes "find the stored vectors closest to the query's vector." Similarity is computed with a simple, fast formula (commonly cosine similarity, the angle between two vectors), which is why it scales to millions of items.

This single primitive underpins a surprising amount of modern AI:

  • Semantic search — match by meaning, not exact words, so "how do I reset my password" finds a doc titled "account recovery steps."
  • Recommendations — items near things you liked are likely things you will like.
  • Clustering and deduplication — group near-identical or related content automatically.
  • Retrieval-augmented generation (RAG) — the most important use here: embed a knowledge base, embed the user's question, retrieve the closest chunks, and paste them into the prompt so the LLM answers from real, current sources instead of memory.

The same idea works beyond text: images, audio, code, and even users and products can be embedded, which is how "find similar images" or cross-modal "search images with a text query" work — different things share one space.

STEP 3

The famous example: directions carry meaning.

The result that made embeddings click for many people: in a well-trained word-embedding space, directions can encode relationships. The classic illustration:

vector("king") - vector("man") + vector("woman")  ≈  vector("queen")

Read it as: "start at king, remove the 'maleness' direction, add the 'femaleness' direction, and you arrive near queen." A consistent geometric offset corresponds to a real-world relationship (here, gender), and similar offsets appear for capital-of-country or singular-to-plural. This emerged from training on plain text, with no dictionary of relationships — strong evidence that the geometry genuinely captures structure, not just rough topical proximity.

Treat this as intuition, not a guarantee. Real embedding spaces are messier than the clean analogy suggests, the arithmetic does not always land exactly, and learned vectors inherit biases present in the training text. Embeddings reflect their data — including its prejudices — which matters when they drive search or recommendations.

STEP 4

Connecting it back to LLMs — and the limits.

Embeddings are not a separate gadget bolted onto LLMs; they are the language model's native input. Recall that text is split into tokens, then integers. The model's very first layer is an embedding layer that turns each token ID into a learned vector. Everything the model does afterward is geometry and arithmetic on those vectors. "Meaning as position in space" is not an analogy for how LLMs work — it is literally the first thing they do.

Two limits keep expectations calibrated. First, similarity is not truth: an embedding can place a question near a passage that is topically related but factually wrong for it; retrieval finds relevant text, not correct answers, which is why RAG still needs the LLM to read and reason, and ideally a verification step. Second, embeddings compress: collapsing rich meaning into a few hundred numbers necessarily loses nuance, so two genuinely different sentences can land suspiciously close.

The durable takeaway: embeddings convert the fuzzy human notion of "similar in meaning" into the precise machine notion of "near in space," and that one conversion is what makes search, recommendation, and retrieval-augmented generation possible at scale. When you later see a system "find relevant context," picture points in space and the nearest neighbours being pulled out — that is the whole mechanism.