Suppose I have two vectors, both of which are probabilities of something (sum to 1). Under what circumstances will correlation (say Pearson corr.) and similarity (say cosine sim.) differ largely?

I found one question one this but it does not touch one this specific question.

**Contents**hide

#### Best Answer

They are in general pretty different in magnitude.

Let $v = (1/n, 1/n, 1/n, … 1/n)$ be an n-dim probability vector.

Then $v$ has a correlation of 1 with itself, but its cosine similarity score with itself is only $1/n$.

Another example:

let $v = (.1, .9)$ and $w = (.9, .1)$. Then the correlation between v and w is -1 but the cosine similarity between them is .18.

Any probability vectors will have a non-negative cosine similarity score, but may have a negative Pearson correlation.