Weaviate
Definition
Weaviate is an open-source vector database with built-in modules for vectorization, hybrid search, and generative features, supporting both self-hosted deployments and a managed cloud service.
Why It Matters
Weaviate offers a middle ground between fully managed services like Pinecone and running raw FAISS. As an open-source project, you can self-host to control costs and data residency. But unlike deploying bare vector indices, Weaviate provides production features: CRUD operations, filtering, replication, and a GraphQL API.
The built-in module system sets Weaviate apart. Modules for text2vec, img2vec, and generative search run inside Weaviate, so you can store raw text and have Weaviate handle embedding generation automatically. This simplifies architectures by removing separate embedding service calls.
For AI engineers, Weaviate matters as a flexible option that scales from local development to production clusters. You can start with Docker Compose for prototyping, then deploy to Kubernetes or use Weaviate Cloud for managed infrastructure as your application grows.
Implementation Basics
Weaviate is organized around a few core concepts:
Classes define your data schema. Each class describes a type of object (Document, Product, FAQ) with its properties and configuration. Unlike schemaless databases, Weaviate’s schema enables validation and optimization.
Modules extend functionality. The text2vec-openai module generates embeddings using OpenAI’s API. The generative-openai module enables RAG queries directly in Weaviate. Enable only the modules you need.
Hybrid search combines vector and keyword approaches. Weaviate’s BM25 implementation handles keyword matching, then fuses results with vector similarity using configurable weighting. This often outperforms pure vector search for enterprise data.
GraphQL API provides flexible querying. You can retrieve objects, get related items, apply filters, and execute similarity searches all through a consistent interface.
Start with the text2vec module matching your embedding provider. Define your schema with relevant properties for filtering. Use hybrid search as your default. It handles both semantic and keyword queries gracefully. Monitor with Weaviate’s built-in metrics before scaling to clusters.
Source
Weaviate is an open-source vector database that stores objects and vectors, supporting hybrid search combining vector and keyword approaches with built-in ML model integration.
https://weaviate.io/developers/weaviate