# Solved – why is the denominator of the correlation coefficient the SD of X multiplied by SD of Y

I don't quite understand what is going on in the correlation coefficient formula. In the numerator we have the covariance, and in the denominator we have the standard deviation of variable x multiplied by the standard deviation of variable y.

So ultimately it is a ratio of covariance to the product of the two standard deviations.

What is dividing by the product of the two standard deviations doing to help us determine the corrrelation ?

I have tried to draw it out to help me visually understand it as I find this helps, but am stuck as to what I should be looking for.

Contents

Covariance and correlation coefficient measure essentially the same effect: How 'linked'[1,2] are two variables, i.e. if \$X\$ increases, how much will \$Y\$ increase on average?

The problem with covariance is that its value depends on the scales of the two variables: The value of the covariance doesn't tell you much if you don't also know over what range \$X\$ and \$Y\$ vary. Therefore, you don't know whether \$X\$ is more 'linked' with \$Y\$ or with \$Z\$ if you only know that, say, \$cov(X,Y)=1000\$ and \$cov(X,Z)=0.1\$. Maybe \$Y\$ is income in dollars (with big spreads), and \$Z\$ is percentage of time spent brushing teeth (very small spreads) — then \$X\$ (amount spent on toothbrushes) may be more linked to \$Z\$, although the covariance value is quite lower .

To account for that, the correlation coefficient norms the covariance: We divide the covariance by the spreads (measured as standard deviations) of \$X\$ and \$Y\$. If you do the math (or run some simulations), you'll see that the correlation coefficient ranges from -1 (complete negative dependence) over 0 (no [linear] dependence) to 1 (complete positive dependence). Thus, it's possible to compare the degrees of 'linkage' between different pairs of variables. And after you've used them for a while, you get a feeling for how much 'linkage' exists for a certain correlation coefficient.

 I'd normally use 'correlated' instead of 'linked', but that might be confused with the correlation coefficient in this answer.

 To be more precise: 'linearly linked'. There are lots of examples where there is a clear relationship between \$X\$ and \$Y\$, but their correlation (and thus covariance) is zero, e.g. if the scatter plot of the two variables looks like a circle or a cross.

 And the covariance values will change if you change the units of the variables, e.g. if you express \$Y\$ in milliseconds or \$Z\$ in fraction of average income.

Rate this post