Embeddings are the input representations of tokens in a Transformer model, turning words or subwords into dense vectors. These vectors capture semantic similarities and enable the model to process textual data numerically. In the article, embeddings are the starting point before self-attention layers operate. Each word is mapped into a high-dimensional space where relationships can be learned. This initial transformation is crucial for the model to understand text. Embeddings are typically learned during training. They work in tandem with positional encodings to preserve order.