I was watching CS224n and I Came across this equation for word2vec loss function.
As in the blue box, "for each documenttraining example t we are calculating the probability of context words given the current word". I wanted to know why we are multiplying the probabilities as in the red boxes. I might be missing out on some math, it would be great if someone can help me. Thanks.
Best Answer
The probabilities are being multiplied because you want to compute the probability of two (or more) events happening at the same time, which is equal to the product of the probabilities of the individual events, under the assumption that the events are independent. I highly recommend you to check basic Wikipedia articles on Maximum Likelihood before to continue, so that you understand the general mechanism.
Similar Posts:
- Solved – Softmax function for skipgram model
- Solved – Skip-gram algorithm confusion
- Solved – How to compute co-occurrence probability assuming independence from unigram probabilities
- Solved – How to compute co-occurrence probability assuming independence from unigram probabilities
- Solved – Does Latent Dirchlet Allocation Work with Bag Of Words Model