Implementation

Bi-Encoder

Definition

A bi-encoder embeds queries and documents independently into dense vectors, enabling fast similarity comparison through pre-computed document embeddings. It is the architecture behind semantic search.

Why It Matters

Bi-encoders make semantic search practical at scale. Without them, comparing a query to millions of documents would require millions of neural network forward passes, which is infeasible for real-time applications. Bi-encoders solve this by pre-computing document embeddings once and reusing them for every query.

This efficiency-accuracy tradeoff is why semantic search works at all. You encode your corpus once during indexing, then only need to encode each new query. Searching millions of documents becomes a simple vector similarity computation.

For AI engineers, bi-encoders are the core of every embedding-based retrieval system. Whether you’re using OpenAI embeddings, Cohere, or open-source models like BGE, you’re using bi-encoders.

Implementation Basics

How bi-encoders work:

Two identical (or shared) encoder networks
Query encoder: converts query to a fixed-size vector
Document encoder: converts document to a fixed-size vector
Similarity: cosine similarity or dot product between vectors

The key insight: Documents are encoded once at index time and stored in a vector database. Only the query needs encoding at search time. This reduces search from O(n × model_complexity) to O(model_complexity) + O(n × vector_comparison).

Training bi-encoders: Trained on query-document pairs using contrastive learning. The model learns to place matching pairs close in vector space and non-matching pairs far apart. Common training datasets: MS MARCO (web search), Natural Questions (Wikipedia QA).

Popular bi-encoder models:

OpenAI text-embedding-3-small/large: High quality, hosted API
Cohere embed-v3: Strong multilingual, API-based
BGE, E5, GTE: Open-source, self-hostable
Sentence-transformers: Python library with many pre-trained models

Limitations: Bi-encoders can’t model query-document interaction directly since they’re encoded separately. They might miss that “car maintenance tips” and “how to change oil” are relevant because the connection requires seeing both together.

Best practices:

Use asymmetric models for search (query and document encoders trained separately)
Match the embedding model to your domain (general vs code vs multilingual)
Consider fine-tuning for specialized domains
Pair with a cross-encoder reranker for improved precision

Bi-encoders give you speed; cross-encoders give you accuracy. Production systems use both in a two-stage retrieval pattern.

Source

Sentence-BERT showed that bi-encoder architectures enable efficient semantic similarity search by encoding sentences independently, trading some accuracy for dramatic speed improvements.

https://arxiv.org/abs/1908.10084

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles