When looking at the eigenvectors of the covariance matrix, we get the directions of maximum variance (the first eigenvector is the direction in which the data varies the most, etc.); this is called principal component analysis (PCA).

I was wondering what it would mean to look at the eigenvectors/values of the mutual information matrix, would they point in the direction of maximum entropy?

**Contents**hide

#### Best Answer

While it is not a direct answer (as it is about *pointwise* mutual information), look at paper relating **word2vec** to a **singular value decomposition** of PMI matrix:

- O. Levy, Y. Goldberg, Neural Word Embedding as Implicit Matrix Factorization

We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS’s solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS’s factorization.