i am trying to understand how python-glove computes most-similar
terms.
Is it using cosine similarity?
Example from python-glove github
https://github.com/maciejkula/glove-python/tree/master/glove
:
I know that from gensim's word2vec, the most_similar
method computes similarity using cosine distance.
Best Answer
Looking at the code, python-glove also computes the cosine similarity. In _similarity_query
it performs these operations:
dst = (np.dot(self.word_vectors, word_vec) / np.linalg.norm(self.word_vectors, axis=1) / np.linalg.norm(word_vec))
You can find the code here if no updates have been performed (otherwise search for the _similarity_query
).
As you can observe, the function first computes the dot
product between the word vectors and the current word embedding, after that the division between the norms (or length) is performed, which corresponds to the definition of the cosine distance $$text{CosineSimilarity(u, v)} = frac {u . v} {||u||_2 ||v||_2} = cos(theta) tag{1}$$ .
Similar Posts:
- Solved – Spark MLLib’s Word2Vec cosine similarity greater than 1
- Solved – Spark MLLib’s Word2Vec cosine similarity greater than 1
- Solved – Why use the cosine distance for machine translation (Mikolov paper)
- Solved – Why word2vec maximizes the cosine similarity between semantically similar words
- Solved – Why word2vec maximizes the cosine similarity between semantically similar words