I understand word embeddings and word2vec.

In this paper: https://arxiv.org/pdf/1603.01547.pdf

they are saying a new type of word embedding.

`Our model uses one word embedding function and two encoder functions. The word embedding function e translates words into vector representations. The first encoder function is a document encoder f that encodes *every word from the document* d *in the context of the whole document*. We call this the **contextual embedding**.`

Is this some new way of encoding, How can I implement this? Thanks .

**Contents**hide

#### Best Answer

The contextual embedding of a word is just the corresponding hidden state of a bi-GRU:

In our model the document encoder $f$ is implemented as

a bidirectional Gated Recurrent Unit (GRU) network whose hidden states form the contextual word embeddings, that is $f_i(d) = overrightarrow{f_i}(d) ,, ||,, overleftarrow{f_i}(d)$, where $||$ denotes vector concatenation and $overrightarrow{f_i}$ and $overleftarrow{f_i}$ denote forward and backward contextual embeddings from the respective recurrent networks.

In red is the contextual embedding of the first word:

### Similar Posts:

- Solved – Difference between non-contextual and contextual word embeddings
- Solved – Why can’t standard conditional language models be trained left-to-right *and* right-to-left
- Solved – Randomly initialized embedding matrix
- Solved – Learning image embeddings using VGG and Word2Vec
- Solved – What does word embedding weighted by tf-idf mean