Solved – Why correlation is not “transitive”

First let me explain what I mean by "transitive"
Suppose that the price of product A and the price of product B has a correlation of .5
Suppose also that the price of product B and product C has a correlation of .5

One might thing that if correlation(A,B) = .5 and correlation(B,C) = .5 then correlation(A,C) > 0, but with an intuitive example we can see why this is not the case:

Suppose that A is a fruit salad made of papaya and banana, suppose also that B is a fruit salad made of banana and strawberry and, finally, suppose that C is strawberries and cream. Clearly, when the price of banana goes up, both the price of A and B will go up; when the price of strawberry goes up the price of B and C will go up; but since A and C have no ingredients in common, their price isn't correlated.

My question is which statistical concept captures this intuitive idea. Is it the concept of dimensions?

What do I mean by transitive?
Let me define the binary operator X corr? Y as 1 if correlation(X,Y) != 0 and 0 otherwise, so:

  • A corr? B = 1,
  • B corr? C = 1, but
  • A corr? C could be = 0

If corr? was transitive, then A corr? C = 1

If the correlations of $A$ to $B$ and $B$ to $C$ are specified, then any value rho_AC such that the $3$ by $3$ matrix of correlations is positive semi-definite is a possible value of rho_AC. Given the correlations of $A$ to $B$ and $B$ to $C$, there will be a minimum and maximum such possible value of rho_AC.

Here is code in CVX under MATLAB which finds the minimum value of rho_AC. The maximum such value is achieved with the identical program, but using maximize rather than minimize. Although it is trivial to find the minimum and maximum possible values of rho_AC in this example, this approach still works quite well as the dimension of the problem increases.

% rho_AB and rho_BC are MATLAB variable values containing input correlations cvx_begin variable rho_AC minimize(rho_AC) % Next line constrains 3 by 3 correlation matrix to be positive semidefinite [1 rho_AB rho_AC;rho_AB 1 rho_BC;rho_AC rho_BC 1] == semidefinite(3)  cvx_end 

Running this program for rho_AB -= 0.5 and rho_BC = 0.5, the minimum possible value for rho_AC is found to be -0.5 and the maximum possible value for rho_AC is found to be 1. Note that although this example is trivially solved without this fancy apparatus, this solution method readily and accurately (numerically stably) also works on higher dimensional problems and other more complicated variants.

Let the positive semi-definite cone be your guide to possible combinations of correlation. The bottom line is that the 3 dimensional positive semi-definite cone does not satisfy transitivity for its 3rd parameter with respect to the other 2 parameters. Note that when rho_AB and rho_BC are both non-negative but unequal to each other, the maximum value of rho_AC will be less than $1$.

Edit: With regard to your edit after my answer: Consider, for example, your example of corr(A,B) = corr(B,C) = 0.5, which I analyzed above. As shown above, per the mathematics of the positive semi-definite cone, $-0.5 le$ corr(A,C) $le 1$. In particular, corr(A,C) might be equal to $0$ or not.

Similar Posts:

Rate this post

Leave a Comment