Suppose I have two vectors, both of which are probabilities of something (sum to 1). Under what circumstances will correlation (say Pearson corr.) and similarity (say cosine sim.) differ largely?
I found one question one this but it does not touch one this specific question.
Best Answer
They are in general pretty different in magnitude.
Let $v = (1/n, 1/n, 1/n, … 1/n)$ be an n-dim probability vector.
Then $v$ has a correlation of 1 with itself, but its cosine similarity score with itself is only $1/n$.
Another example:
let $v = (.1, .9)$ and $w = (.9, .1)$. Then the correlation between v and w is -1 but the cosine similarity between them is .18.
Any probability vectors will have a non-negative cosine similarity score, but may have a negative Pearson correlation.