I understand word embeddings and word2vec.
In this paper: https://arxiv.org/pdf/1603.01547.pdf
they are saying a new type of word embedding.
Our model uses one word embedding function and two encoder functions. The word embedding function e translates words into vector representations. The first encoder function is a document encoder f that encodes *every word from the document* d *in the context of the whole document*. We call this the **contextual embedding**.
Is this some new way of encoding, How can I implement this? Thanks .
Best Answer
The contextual embedding of a word is just the corresponding hidden state of a bi-GRU:
In our model the document encoder $f$ is implemented as a bidirectional Gated Recurrent Unit (GRU) network whose hidden states form the contextual word embeddings, that is $f_i(d) = overrightarrow{f_i}(d) ,, ||,, overleftarrow{f_i}(d)$, where $||$ denotes vector concatenation and $overrightarrow{f_i}$ and $overleftarrow{f_i}$ denote forward and backward contextual embeddings from the respective recurrent networks.
In red is the contextual embedding of the first word:
Similar Posts:
- Solved – Difference between non-contextual and contextual word embeddings
- Solved – Why can’t standard conditional language models be trained left-to-right *and* right-to-left
- Solved – Randomly initialized embedding matrix
- Solved – Learning image embeddings using VGG and Word2Vec
- Solved – What does word embedding weighted by tf-idf mean