I have been examining the use of the Point Biserial correlation as a statistic to measure the relationship between a dichotomous variable and a continuous one. Wikipedia et. al. seem to concur that the Point Biserial Correlation is a special case of the Pearson Correlation, but I cannot find a proof for this, algebraic or otherwise, and it is making me wary of using this in the context of the research I am doing (I need to do some statistical confidence testing afterwards). I have tried deriving the truth myself, but have chased everything round in a circle.
Any advice greatly appreciated.
Best Answer
Let the $n$ data consist of $n_0gt 0$ $(x, 0)$ pairs and $n_1gt 0$ $(x, 1)$ pairs. Their Pearson correlation coefficient will be the same as the reversed data consisting of corresponding $(0,x)$ and $(1,x)$ pairs. Because there are exactly two distinct values of the first coordinates, the regression line of the reversed data must pass through the mean points $(0,M_0)$ and $(1,M_1)$, whence it has slope $(M_1-M_0)/(1-0) = M_1-M_0$. The correlation coefficient is obtained by standardizing this: it must be multiplied by the standard deviation of the first coordinates and divided by the standard deviation of the second coordinates (the original $x$ values), written $s_n$. The standard deviation of the first coordinates is readily computed from the fact that they consist of $n_0$ zeros and $n_1$ ones; it equals
$$sqrt{frac{n_1}{n}left(1-frac{n_1}{n}right)} = sqrt{frac{n_0n_1}{n^2}}.$$
Consequently the Pearson correlation coefficient is
$$r = frac{M_1-M_0}{s_n}sqrt{frac{n_0n_1}{n^2}},$$
which is precisely the Wikipedia formula for the point-biserial coefficient.
The heights of the red dots depict the mean values $M_0$ and $M_1$ of each vertical strip of points. The dashed gray line is the regression line.
Similar Posts:
- Solved – Correlation not significant because there is not enough variance
- Solved – Confidence interval on point biserial correlation coefficient
- Solved – Instability of one-pass algorithm for correlation coefficient
- Solved – Calculating correlation coefficient
- Solved – Intuitive explanation for when Pearson correlation coefficient equals 1