What is an embedding?

An embedding is a way to change words or pictures into lists of numbers so a computer can tell which ones are similar or related.

7 min read min de lecture

~$ man embedding

What is an embedding?

AI & LLMs gneurone encyclopedia
An embedding is a way to change words or pictures into lists of numbers so a computer can tell which ones are similar or related.

definition

In AI and machine learning, an embedding is a dense vector of floating-point numbers that represents an item such as a word, sentence, or image in a continuous space.

The vector is learned so that items with similar meaning or features have vectors that are close together according to distance metrics like cosine similarity.

Embeddings reduce high-dimensional data to a lower-dimensional form while preserving semantic relationships, enabling efficient downstream tasks in neural networks.

Imagine every word in a dictionary placed on a giant map where words that mean almost the same thing sit close together and opposite words sit far apart; the coordinates of each word on that map are its embedding.

key takeaways

  • Embeddings convert discrete data into continuous numerical vectors that models can process mathematically.
  • They capture semantic similarity so related items receive nearby vectors in the embedding space.
  • Training usually occurs with objectives such as predicting context words or reconstructing input data.
  • Common dimensions range from 128 to 4096 depending on the model size and task requirements.
  • Embeddings power applications including semantic search, recommendation engines, and retrieval-augmented generation.

the 2026 job market

By 2026 vector search and retrieval-augmented generation dominate production AI systems, creating steady demand for engineers who can design, fine-tune and productionize embedding models in roles such as AI Engineer and ML Infrastructure specialist.

AI Engineer · $145k-$210k (US) / $115k-$165k (Canada) / £85k-£125k (UK)ML Engineer · $140k-$200k (US) / $110k-$160k (Canada) / £80k-£120k (UK)Data Scientist · $125k-$185k (US) / $100k-$150k (Canada) / £75k-£110k (UK)

frequently asked questions

How do you create an embedding for new text?

You pass the text through a trained encoder model such as BERT or a sentence transformer and extract the output vector from a chosen layer. The resulting vector is the embedding for that text.

What is the difference between an embedding and a vector?

Every embedding is a vector, but not every vector is an embedding. An embedding is a vector that has been specifically trained to encode semantic or structural meaning of the original data.

Why are embeddings useful inside large language models?

They let the model represent tokens as numbers that carry context and similarity information, allowing attention mechanisms to operate efficiently across sequences.

Can embeddings be updated after the model is trained?

Yes, you can fine-tune the embedding model on new data or use techniques such as incremental learning to adjust vectors without retraining from scratch.

courses to go further

$ cat ./full-guide.mdAssistant IA RAG Multimodal : les 9 étapes clés pour passer de zéro à opérationnelread the guide →

related terms

< back to the encyclopedia

Auteur(s)

R

REHOUMA Haythem

Haythem Rehouma est un ingénieur et architecte IA et cloud, formateur et enseignant technique, avec un profil orienté IA médicale, AWS, MLOps, LLM/RAG et vision par ordinateur.