Embedding
A numerical vector that represents text, images, or other data so AI models can compare meaning, with similar items ending up near each other in vector space.
An embedding is the mathematical representation that turns words, sentences, images, or any other input into a list of numbers a model can reason over. A sentence like "the quarterly board meeting got rescheduled" becomes a vector of 1,536 floating-point numbers when passed through OpenAI text-embedding-3-small, or 1,024 numbers through bge-large. The actual numbers are meaningless on their own. What matters is the geometry. Two sentences that mean similar things end up close together in the high-dimensional space. Two sentences with different meanings end up far apart. That property is the whole reason embeddings exist.
Embeddings power semantic search, recommendation systems, clustering, classification, and RAG pipelines. The standard flow is to convert source documents into embeddings, store them in a vector database, then convert each user query into an embedding with the same model and return the closest matches. The choice of embedding model affects retrieval quality more than most teams realize. text-embedding-3-large produces better semantic matches than text-embedding-3-small at 6x the cost. bge-large-en runs locally with no API fee and often outperforms OpenAI on technical document retrieval. Sentence-transformers models from the BAAI and intfloat teams dominate the MTEB benchmark for English retrieval tasks.
In production an embedding model gets pinned to a version and never changes without a full reindex. Mixing vectors generated by different models inside the same database produces incoherent results because the geometries are not comparable. The AI Ops Department standard stack uses one embedding model end to end, with a documented upgrade path that includes regenerating the entire vector index before traffic switches over. Embedding generation cost itself is small for one-time indexing but adds up for high-volume systems that re-embed user queries thousands of times an hour.
- A 200-word product FAQ chunk becomes a 1,536-number vector after passing through text-embedding-3-small, and lives in Pinecone next to 11,000 similar vectors.
- A knowledge base copilot uses bge-large-en running on a single GPU to embed 40,000 internal Notion pages without paying any external API fee.
- A semantic search prototype indexes 8,000 support tickets and finds duplicates by clustering ticket embeddings, surfacing patterns the team never noticed in keyword search.
What dimension size should I use?
Can I mix embedding models in one database?
How are image embeddings different from text embeddings?
What does an embedding cost to generate?
EOI runs fractional AI departments for funded teams under 50. Sales, Content, Ops, Support. Live in 14 days on a monthly retainer.