Bi-Encoder
Definition
A bi-encoder embeds queries and documents independently into dense vectors, enabling fast similarity comparison through pre-computed document embeddings. It is the architecture behind semantic search.
Why It Matters
Bi-encoders make semantic search practical at scale. Without them, comparing a query to millions of documents would require millions of neural network forward passes, which is infeasible for real-time applications. Bi-encoders solve this by pre-computing document embeddings once and reusing them for every query.
This efficiency-accuracy tradeoff is why semantic search works at all. You encode your corpus once during indexing, then only need to encode each new query. Searching millions of documents becomes a simple vector similarity computation.
For AI engineers, bi-encoders are the core of every embedding-based retrieval system. Whether you’re using OpenAI embeddings, Cohere, or open-source models like BGE, you’re using bi-encoders.
Implementation Basics
How bi-encoders work:
- Two identical (or shared) encoder networks
- Query encoder: converts query to a fixed-size vector
- Document encoder: converts document to a fixed-size vector
- Similarity: cosine similarity or dot product between vectors
The key insight: Documents are encoded once at index time and stored in a vector database. Only the query needs encoding at search time. This reduces search from O(n × model_complexity) to O(model_complexity) + O(n × vector_comparison).
Training bi-encoders: Trained on query-document pairs using contrastive learning. The model learns to place matching pairs close in vector space and non-matching pairs far apart. Common training datasets: MS MARCO (web search), Natural Questions (Wikipedia QA).
Popular bi-encoder models:
- OpenAI text-embedding-3-small/large: High quality, hosted API
- Cohere embed-v3: Strong multilingual, API-based
- BGE, E5, GTE: Open-source, self-hostable
- Sentence-transformers: Python library with many pre-trained models
Limitations: Bi-encoders can’t model query-document interaction directly since they’re encoded separately. They might miss that “car maintenance tips” and “how to change oil” are relevant because the connection requires seeing both together.
Best practices:
- Use asymmetric models for search (query and document encoders trained separately)
- Match the embedding model to your domain (general vs code vs multilingual)
- Consider fine-tuning for specialized domains
- Pair with a cross-encoder reranker for improved precision
Bi-encoders give you speed; cross-encoders give you accuracy. Production systems use both in a two-stage retrieval pattern.
Source
Sentence-BERT showed that bi-encoder architectures enable efficient semantic similarity search by encoding sentences independently, trading some accuracy for dramatic speed improvements.
https://arxiv.org/abs/1908.10084