Aprenda Multi-Head Attention: Multiple Perspectives | Self-Attention and Multi-Head Attention

Deslize para mostrar o menu

When you use multi-head attention, you are essentially allowing a model to look at the same set of queries, keys, and values from several different perspectives at once. Instead of processing all information in a single, fixed way, multi-head attention splits the data into multiple subspaces by projecting the queries, keys, and values through separate, learned linear transformations. Each projection creates a different head, and each head performs its own attention calculation.

Note

Multiple heads in attention mechanisms allow the model to capture a wide range of relationships and patterns in the data. Each head can focus on different types of information, such as local details or global context, and this diversity makes the overall representation richer and more flexible.

Think of each attention head as a camera, each one looking at the data from a different angle. When you project the queries, keys, and values into separate subspaces, each head focuses on unique features. Some patterns that are hard to see from one perspective become clear from another. This is like shining different colored lights on an object to reveal new textures or shapes.

Each head produces its own attention output. These outputs are then joined together by stacking them side by side, creating a longer vector that holds information from every head. This combined vector goes through one more linear transformation. The final result is a rich, integrated view that brings together the varied insights of all the heads.

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 2. Capítulo 3

Pergunte à IA

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 2. Capítulo 3