[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path

Notice

Recent Posts

Recent Comments

Link

Github

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Tech for good

[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 9. Create Image Captioning Models 본문

IT/Cloud

[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 9. Create Image Captioning Models

Diana Kang 2023. 9. 30. 15:31

https://www.youtube.com/playlist?list=PLIivdWyY5sqIlLF9JHbyiqzZbib9pFt4x

Generative AI Learning Path

https://goo.gle/LearnGenAI

www.youtube.com

Introduction

1. Pass images to encoder

2. Extract information from the images

3. Create some feature vectors

4. Vectors are passed to the decoder

5. Build captions by generating words, one by one.

Extract features by using backbones (e.g. ResNet, Inception, EfficientNet, ...)

Keras application 활용 -> Image backbone으로 InceptionResNet 사용한 경우

Decoder

It gets words one by one and makes the information of words and images, which is coming from the encoder output.
And tries to predict the next words.

Attention layer

1. The first embedding layer creates word embedding.

2. And we are passing it to GRU layer.

FYI.

GRU is a variation of Recurrent Neural Network(RNN).
RNN takes inputs and updates its internal states and generates output.
- By passing sequential data (i.e. text data), it keeps the sequential dependencies from previous iuputs (i.e. previous words).

3. The GRU output goes to attention layer, which mixes the information of text and image.

In TensorFlow Keras, we can use predefined layers in the same way as other layers.

Inside attention layer, it pays attention to image feature from text data.
- By doing so, it can calculate attention score by mixing both information.

Attention layer takes two inputs -- ([gru_output, encoder_output])
- gru_output => It is used as attention query and key
- encoder_output => It is used as attention value

Add layer adds two same-shaped vectors.
- gru_output is passed to attention layer, and to this add layer directly.
These two flows are eventually merged in this add layer.
This architecture is called skip connection, which has been a very popular deep neural network design pattern since ResNet.
- It is also called residual connection.
This skip connection is very useful, especially when you want to design a very deep neural network.
- And it is also using the transformer.

Inference Phase

01.

In training phase, TensorFlow Keras can automatically handle gru_state for each sequence.
- But in this inference phase, since we designed our own custom function, we need to write a logic to deal with it explicitly.
- So at the beginning of each captioning, we explicitly initialize the gru_state with some value.

03.

When our decoder generates this token, we can finish this for loop.
- Or you can go out of the loop when the length of the caption reaches some number, MAX_CAPTION_LENGTH.

Code

We initialize two things -- gru_state, <start> token

We process the input image and pass it to the encoder we train.

Github Repo for Image Captioning with Visual Attention Image Captioning with Visual Attention:

https://github.com/GoogleCloudPlatform/asl-ml-immersion/blob/master/notebooks/multi_modal/solutions/image_captioning.ipynb

'IT > Cloud' 카테고리의 다른 글

[Google Cloud Skills Boost(Qwiklabs)] Gemini for Google Workspace - 02. Gemini in Gmail (0)	2024.06.13
[Google Cloud Skills Boost(Qwiklabs)] Gemini for Google Workspace - 01. Introduction to Gemini for Google Workspace (0)	2024.06.13
[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 10. Introduction to Generative AI Studio (1)	2023.09.09
[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 8. Transformer Models and BERT Model (0)	2023.09.09
[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 7. Attention Mechanism (0)	2023.09.09

'IT/Cloud' Related Articles

Tech for good

[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 9. Create Image Captioning Models 본문

[Google Cloud Skills Boost(Qwiklabs)] Introduction to Generative AI Learning Path - 9. Create Image Captioning Models

'IT > Cloud' 카테고리의 다른 글

티스토리툴바