Bidirectional and Multi-layer RNNs

NLP/NLP_Stanford 2024. 6. 22. 10:16

※ Writing while taking a course 「Stanford CS224N NLP with Deep Learning」

※ https://www.youtube.com/watch?v=0LixFSa7yts&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4&index=6&t=2101s

Lecture 6 - Simple and LSTM RNNs

We can regard the hidden states as a representation of a word in context. That the low, that we have just a word vector for terribly. But we then looked at our context and we've created a hidden state representation for the word terribly, in the context of the movie was. And that proves to be a really useful idea, because words have different meanings in different contexts. But it seems like there's a defect of what we've done here, because our context only contains information from the left. What about right context? Surely, it would also be useful to have the meaning of terribly depend on exciting. Becuase often, words mean different things based on what follows them.

How could we deal with that? An easy way to deal with that, if we're just wanting to come up with a neural encoding of a sentence, we could have a second RNN with completely separate parameters learned. And we could run it backwards through the sentence to get a backward representation of each word. And then, we could get an overall representation of each word and context by just concatenating those two representations. And now we've got a representation of terribly that has both left and right context.

So we're simply running a forward RNN(commonly, it'll be an LSTM) and a backward one. And then at each timestep, we're just concatenating their representations, with each of these having separate weights. And so then we regard this concatenated thing as the hidden state, the contextual representation of a token at a particular time, that we pass forward.

Note: bidirectional RNNs are only applicable if you have access to the entire input sequence. They are not applicable to Language Modeling, because in LM you only have left context available.

If you do have entire input sequence (e.g., any kind of encoding), bidirectionalilty is powerful (you should use it by default).

For example, BERT (Bidirectional Encoder Representations from Transformers) is a powerful pretrained contextual representation system built on bidirectionality.

'NLP > NLP_Stanford' 카테고리의 다른 글

Pretraining (0)	2024.06.23
Self-Attention & Transformers (0)	2024.06.22
Neural Machine Translation (0)	2024.06.22
Sequence-to-Sequence model (0)	2024.06.22
Secret of LSTM (0)	2024.06.22

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

'NLP > NLP_Stanford' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'NLP > NLP_Stanford' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바