[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path

IT/Cloud

Diana Kang 2023. 9. 9. 15:51

Generative AI Learning Path

https://goo.gle/LearnGenAI

www.youtube.com

Attention mechanism is behind all the transformer models and which is core to the LLM models.

Encoder-Decoder is a popular model that is used to translate sentences.
The encoder-decoder takes one word at a time and translates it at each time step.

However, sometimes the words in the source language do not align with the words in the target languages.
- ex) The first English word is black, however, in the translation, the first French word is chat which means cat in English.

Then how can you train a model to focus more on the word cat instead of the word black?
To improve the translation, you can add the attention mechanism to the encoder-decoder.
Attention mechanism is a technique that allows the neural network to focus on specific parts of an input sequence.
This is done by assigning weights to different parts of the input sequence with the most important parts receiving the highest weights.

In the traditional RNN encoder-decoder, the model takes one word at a time as input, updates the hidden state, and passes it on to the next time step.
- Only the final hidden state is passed on the decoder.
- The decoder works with the final hidden state for processing and translates it to the target language.

An attention model differs from the traditional sequence-to-sequence model in two ways.

1. The encoder passes a lot more data to the decoder.

So instead of just passing the final hidden state number #3 to the decoder, the encoder passes all the hidden states from each time step.
This gives the decoder more context beyond just the final hidden state.
The decoder uses all the hidden state information to translate the sentence.

2. The attention mechanism is adding an extra step to the decoder before producing its output.

To focus on the most relevant parts of the input, the decoder does the following:
Step1. It looks at the set of encoder states that it has received.

Step3. It multiplies each hidden state by its soft-max score.
- Thus amplifying hidden states with the highest scores(초록색) and downsizing hidden states with low scores(회색).

This is how you can use an attention mechanism to improve the performance
of a traditional encoder-decoder architecture.