Understanding Cross-attention | Large Language Models

Cross-attention

Cross-attention is used in Transformer decoders to relate two different sequences, such as aligning the input from the encoder with the output being generated. While self-attention looks within a single sequence, cross-attention allows the model to refer to encoder outputs while generating each token. This is crucial in tasks like translation, where the decoder must stay grounded in the original sentence. In the article's context, cross-attention is mentioned as a complementary mechanism to self-attention. It follows the same attention principles but bridges two sequences. Understanding this helps in distinguishing encoder-decoder interaction. It reinforces how Transformers handle input-output mappings.