Question 1

What dimension size should I use?

Accepted Answer

Most production models produce vectors between 384 and 3,072 dimensions. Higher dimensions capture more nuance at the cost of storage and query latency. text-embedding-3-small at 1,536 dimensions is a strong default for English RAG. text-embedding-3-large at 3,072 gives better quality when retrieval accuracy matters more than infrastructure cost.

Question 2

Can I mix embedding models in one database?

Accepted Answer

No. Vectors from different models live in different geometric spaces and cannot be compared meaningfully. If you change embedding models you have to regenerate every stored vector. Production systems pin the model version and treat changes as full reindex events with planning around them.

Question 3

How are image embeddings different from text embeddings?

Accepted Answer

The math is the same. The model is different. CLIP and SigLIP produce embeddings where an image of a dog and the text "a dog" end up close together, which enables cross-modal search. Pure text models like BGE only embed text. Multimodal applications pick CLIP-family models so they can index pictures and queries in the same vector space.

Question 4

What does an embedding cost to generate?

Accepted Answer

OpenAI text-embedding-3-small runs around $0.02 per million tokens. text-embedding-3-large runs around $0.13 per million tokens. Self-hosted bge-large on a single GPU costs only the compute, often $50 to $150 a month for steady production load. Indexing a 50,000-page corpus through OpenAI typically lands under $20 one-time.