// Glossary · technical

Embedding

Also: vector embedding · text embedding · dense vector

A numerical vector that represents text, images, or other data so AI models can compare meaning, with similar items ending up near each other in vector space.

An embedding is the mathematical representation that turns words, sentences, images, or any other input into a list of numbers a model can reason over. A sentence like "the quarterly board meeting got rescheduled" becomes a vector of 1,536 floating-point numbers when passed through OpenAI text-embedding-3-small, or 1,024 numbers through bge-large. The actual numbers are meaningless on their own. What matters is the geometry. Two sentences that mean similar things end up close together in the high-dimensional space. Two sentences with different meanings end up far apart. That property is the whole reason embeddings exist.

Embeddings power semantic search, recommendation systems, clustering, classification, and RAG pipelines. The standard flow is to convert source documents into embeddings, store them in a vector database, then convert each user query into an embedding with the same model and return the closest matches. The choice of embedding model affects retrieval quality more than most teams realize. text-embedding-3-large produces better semantic matches than text-embedding-3-small at 6x the cost. bge-large-en runs locally with no API fee and often outperforms OpenAI on technical document retrieval. Sentence-transformers models from the BAAI and intfloat teams dominate the MTEB benchmark for English retrieval tasks.

In production an embedding model gets pinned to a version and never changes without a full reindex. Mixing vectors generated by different models inside the same database produces incoherent results because the geometries are not comparable. The AI Ops Department standard stack uses one embedding model end to end, with a documented upgrade path that includes regenerating the entire vector index before traffic switches over. Embedding generation cost itself is small for one-time indexing but adds up for high-volume systems that re-embed user queries thousands of times an hour.

// Examples
  • A 200-word product FAQ chunk becomes a 1,536-number vector after passing through text-embedding-3-small, and lives in Pinecone next to 11,000 similar vectors.
  • A knowledge base copilot uses bge-large-en running on a single GPU to embed 40,000 internal Notion pages without paying any external API fee.
  • A semantic search prototype indexes 8,000 support tickets and finds duplicates by clustering ticket embeddings, surfacing patterns the team never noticed in keyword search.
// Common questions
What dimension size should I use?
Most production models produce vectors between 384 and 3,072 dimensions. Higher dimensions capture more nuance at the cost of storage and query latency. text-embedding-3-small at 1,536 dimensions is a strong default for English RAG. text-embedding-3-large at 3,072 gives better quality when retrieval accuracy matters more than infrastructure cost.
Can I mix embedding models in one database?
No. Vectors from different models live in different geometric spaces and cannot be compared meaningfully. If you change embedding models you have to regenerate every stored vector. Production systems pin the model version and treat changes as full reindex events with planning around them.
How are image embeddings different from text embeddings?
The math is the same. The model is different. CLIP and SigLIP produce embeddings where an image of a dog and the text "a dog" end up close together, which enables cross-modal search. Pure text models like BGE only embed text. Multimodal applications pick CLIP-family models so they can index pictures and queries in the same vector space.
What does an embedding cost to generate?
OpenAI text-embedding-3-small runs around $0.02 per million tokens. text-embedding-3-large runs around $0.13 per million tokens. Self-hosted bge-large on a single GPU costs only the compute, often $50 to $150 a month for steady production load. Indexing a 50,000-page corpus through OpenAI typically lands under $20 one-time.
// Related terms
// Ready to ship?

EOI runs fractional AI departments for funded teams under 50. Sales, Content, Ops, Support. Live in 14 days on a monthly retainer.