// Glossary · technical

Vector Database

Also: vector store · vector search engine · embedding database

A database built to store and query high-dimensional vector embeddings by semantic similarity, foundational for RAG and semantic search systems.

A vector database is the storage layer that makes semantic search possible. Traditional databases index rows by exact values and string matches. A vector database indexes rows by their position in a high-dimensional embedding space, then answers queries with the rows whose vectors sit closest to the query vector. Closest is measured with cosine similarity or dot product. The result is a system that finds documents by meaning rather than by keyword. A user asks about "reducing churn" and the database returns chunks that talk about retention, win-back, and account expansion, even when none of those exact words appear in the query.

Vector databases sit at the core of every production RAG system. The flow looks the same across deployments. Documents get chunked, each chunk gets converted to an embedding by a model like text-embedding-3-large or bge-large, and the embedding gets written to the vector database along with metadata pointing back to the source. At query time the user question gets embedded with the same model, the database returns the top-k most similar chunks, and those chunks get passed to the language model as grounded context. Common production choices include Pinecone, Weaviate, Qdrant, Milvus, and pgvector inside Postgres. Choice depends on document volume, query throughput, and whether you need self-hosted infrastructure.

For sensitive workloads the vector database stays inside your perimeter. Healthcare clients run pgvector on their own Postgres or self-hosted Qdrant on-prem so PHI never enters a third-party SaaS. Funded teams running the AI Ops Department and AI Support Department get the vector database deployed against their knowledge base as part of standard delivery. The database is not the hard part. The hard part is choosing chunk sizes, embedding models, and reranking strategies that produce retrieval quality good enough for production traffic instead of demo-grade results.

// Examples
  • A support deflection system indexes 12,000 help-center chunks in Pinecone and serves 800 retrieval queries a day at sub-200ms latency.
  • A regulated SaaS team runs pgvector inside their existing Postgres so internal copilot search stays within their compliance perimeter without adding a new vendor.
  • A sales engineering copilot uses Qdrant to index 4,800 past RFP responses and returns the top 8 most similar chunks per query in under 50ms.
// Common questions
Do I need a dedicated vector database for small projects?
Below 10,000 chunks you can run vector search in memory with FAISS or as a Postgres extension with pgvector. Above 100,000 chunks a dedicated database starts paying off in query latency and operational simplicity. The line moves with query volume and the cost of hosting your own infrastructure versus paying a managed service.
Which vector database should I pick?
Pinecone is the easiest managed option for teams that want zero ops. Weaviate and Qdrant give you more control and run self-hosted. pgvector sits inside an existing Postgres deployment, the right call when you already run Postgres and want one fewer system to operate. Milvus scales to billions of vectors for the rare cases that need it.
How does a vector database compare to Elasticsearch?
Elasticsearch indexes text by keywords using BM25 scoring. A vector database indexes by semantic meaning using embedding similarity. Production systems often combine both in a hybrid search pattern, with keyword search catching exact matches like product SKUs and vector search catching paraphrases and conceptual queries.
What does it cost to run one in production?
Pinecone serverless starts around $50 a month for small workloads and scales with stored vectors and query volume. Self-hosted Qdrant or pgvector costs you the underlying compute, typically $100 to $400 a month for mid-sized RAG deployments. Embedding generation itself usually costs more than database hosting once volume picks up.
// Related terms
// Ready to ship?

EOI runs fractional AI departments for funded teams under 50. Sales, Content, Ops, Support. Live in 14 days on a monthly retainer.