Back to Glossary
LLM

Embeddings

Definition

Embeddings are dense numerical vectors that represent text, images, or other data in a high-dimensional space where semantically similar items are positioned closer together, enabling similarity search and retrieval.

Why It Matters

Embeddings are the bridge between human language and machine understanding. They transform text into numbers that preserve meaning, and similar concepts end up near each other in vector space. “Dog” is closer to “puppy” than to “refrigerator.”

This mathematical representation unlocks semantic search. Instead of matching exact keywords, you can find documents that mean similar things. A search for “machine learning salary” finds relevant results even if documents use “ML compensation” or “AI engineer pay.”

For AI engineers, embeddings are everywhere. They power RAG systems, recommendation engines, duplicate detection, and clustering. Understanding how to choose, generate, and work with embeddings is a core skill.

Implementation Basics

Generating Embeddings Use embedding models like OpenAI’s text-embedding-3-small, Cohere’s embed-v3, or open-source alternatives like BGE and E5. Each model has different dimensionality (512-3072 dimensions typically) and trade-offs between quality, cost, and speed.

Key Considerations

  • Dimension size: Higher dimensions capture more nuance but cost more to store and search
  • Model choice: Domain-specific models often outperform general-purpose ones for specialized content
  • Normalization: Most similarity searches use normalized vectors (length = 1)
  • Batching: Generate embeddings in batches for efficiency, since API calls have overhead

Similarity Metrics Cosine similarity is standard: it measures the angle between vectors, ignoring magnitude. Dot product works identically on normalized vectors and is often faster. Euclidean distance is another option but less common for text.

Practical Tips Store the embedding model version alongside your vectors. If you switch models, all vectors need regeneration since different models produce incompatible vector spaces. Always embed queries with the same model used for documents.

Source

Word2Vec demonstrated that word embeddings capture semantic relationships through vector arithmetic (king - man + woman ≈ queen).

https://arxiv.org/abs/1301.3781