Approximate Nearest Neighbor (ANN)
Definition
ANN algorithms find vectors similar to a query vector without exhaustively comparing every vector, trading a small accuracy loss for dramatically faster search. This is essential for semantic search at scale.
Why It Matters
Exact nearest neighbor search is O(n), meaning you must compare against every vector in your database. With 100 million documents, that’s 100 million comparisons per query. Too slow for real-time search.
ANN algorithms sacrifice a tiny bit of accuracy (finding the “approximately” nearest neighbors) for dramatic speed improvements. You might miss the absolute closest vector occasionally, but you’ll find the top 10 in milliseconds instead of seconds.
For AI engineers, ANN is what makes vector databases work. Whether you use Pinecone, Weaviate, Qdrant, or Chroma, they all use ANN algorithms internally. Understanding ANN helps you tune your index for your latency and accuracy requirements.
Implementation Basics
Common ANN algorithms:
HNSW (Hierarchical Navigable Small World) The most popular choice. Builds a graph where each vector connects to nearby vectors across multiple layers. Search navigates the graph from coarse to fine layers, quickly zeroing in on relevant regions.
IVF (Inverted File Index) Clusters vectors, then only searches within relevant clusters. Faster to build than HNSW but often lower quality. Good for very large datasets where HNSW memory is prohibitive.
Product Quantization (PQ) Compresses vectors by dividing them into subvectors and quantizing each. Reduces memory dramatically at some quality cost. Often combined with IVF.
Key tradeoffs:
- Build time vs search time: HNSW is slow to build but fast to search. IVF is fast to build but slower to search.
- Memory vs quality: PQ uses less memory but loses some precision. HNSW keeps full vectors but uses more RAM.
- Recall vs latency: More thorough search = higher recall = more latency.
Tuning parameters: For HNSW:
- M: Connections per node. Higher = better recall, more memory.
- efConstruction: Build-time search depth. Higher = better index quality.
- efSearch: Query-time search depth. Higher = better recall, more latency.
Choosing an approach:
- Small datasets (< 1M): HNSW with default parameters
- Large datasets (1M-100M): HNSW with tuned parameters, or IVF+PQ
- Very large (> 100M): IVF with PQ, possibly with HNSW on cluster centroids
Practical advice: Start with your vector database’s defaults. Measure recall (do top 10 ANN results match exact top 10?) and latency. Only tune when you’ve identified a specific problem. Most applications work well with default HNSW settings.
Source
Approximate nearest neighbor algorithms like HNSW achieve sub-linear search complexity, enabling millisecond-latency search over billions of vectors.
https://arxiv.org/abs/1603.09320