Implementation

Retrieval

Definition

Retrieval is the process of finding and ranking relevant documents from a knowledge base in response to a query, typically using embedding similarity, keyword matching, or hybrid approaches.

Why It Matters

Retrieval is the bottleneck in most RAG systems. Your LLM can only reason about what you provide. If retrieval misses relevant documents or returns irrelevant ones, the generated answer suffers, no matter how capable the model.

The retrieval quality equation is simple: precision (are retrieved docs relevant?) and recall (did we find all relevant docs?). In practice, you’re optimizing for both while balancing latency and cost. Retrieving 100 documents might improve recall but wastes tokens and slows responses.

For AI engineers, retrieval engineering is where you spend most debugging time. When answers are wrong or incomplete, the cause is almost always retrieval, not the LLM.

Implementation Basics

Retrieval Approaches

Dense retrieval: Embed query and documents, find nearest neighbors. Good for semantic similarity.
Sparse retrieval (BM25): Keyword matching with term frequency weighting. Good for specific terms.
Hybrid: Combine dense and sparse scores. Often best of both worlds.

Key Parameters

Top-K: How many documents to retrieve (typically 3-10)
Similarity threshold: Minimum score to include a result
Reranking: Use a more expensive model to reorder initial results

Improving Retrieval

Query expansion: Rewrite user queries to match document language
Metadata filtering: Narrow search by date, category, or source
Multi-query: Generate variations of the query and merge results
Reranking: Use cross-encoders to score query-document pairs more accurately

Debugging Retrieval When answers are wrong, always inspect retrieved documents first. Add logging to see what’s being retrieved. Common issues: query-document vocabulary mismatch, wrong metadata filters, insufficient chunks, or embeddings from different models.

Start simple. Basic semantic search often works well. Add complexity (hybrid search, reranking) only when you identify specific failure modes through testing.

Source

Dense retrieval using learned embeddings outperforms traditional sparse methods like BM25 on many benchmarks, while hybrid approaches combining both show further improvements.

https://arxiv.org/abs/2112.09118

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles