Solved – name or reference in a published journal/book for the following variance formula

Variance can be combined as

$$v=frac{1}{n-1}left(sum_{i = 1}^{numGroups}n_{i}(m_{i}-m)^2+ sum_{i = 1}^{numGroups}(n_{i}-1)v_{i}right)$$

where $v$ is the combined variance, $n$ is the total sample size, $n_i$ is the number of points in group $i$, $numGroups$ is the total number of groups, $m_i$ is the mean of group $i$, $m$ is the combined mean, $v_i$ is the variance of the $i^{th}$ group

Is there a name for this formula or any reference to it?

Let $x_{i,j}$ denote the $j$-th data point in the $i$-th group which has $n_i$ data points. There are $N$ such groups and thus a total of $sum_{i=1}^N n_i = n$ data points.

If the sample mean and sample variance of the $i$-th group are $m_i$ and $v_i$ respectively, then we have $$n_icdot m_i = sum_{j=1}^{n_i} x_{i,j}quad text{and} quad (n_i-1)v_i = sum_{j=1}^{n_i} left(x_{i,j} – m_iright)^2.$$ It follows that $displaystyle sum_{i=1}^N sum_{j=1}^{n_i} x_{i,j} = sum_{i=1}^N n_icdot m_i = ncdot m$ where $m$ is the overall mean of the $n$ data points. Similarly, the sum $displaystyle sum_{i=1}^N (n_i-1)v_i = sum_{i=1}^N sum_{j=1}^{n_i}left(x_{i,j} – m_iright)^2$ can be recognized as the sum of the squared deviations of the data points from the means of their respective groups. This is not quite what we want for calculating the variance of the $n$ data points: we need to know the sum of the squared deviations from $m$. Fortunately, all that is needed is a little algebra. We have that $$begin{align} sum_{i=1}^Nsum_{j=1}^{n_i} left(x_{i,j} – mright)^2 &= sum_{i=1}^N left[sum_{j=1}^{n_i}left(x_{i,j}^2 -2x_{i,j}m + m^2right)right]\ &= sum_{i=1}^N left[left(sum_{j=1}^{n_i}x_{i,j}^2right) -2n_im_im + n_im^2right]\ &= sum_{i=1}^N left[left(sum_{j=1}^{n_i}x_{i,j}^2right) + n_i(m^2 -2m_im + m_i^2) – n_im_i^2right]\ &=sum_{i=1}^N left[n_i(m_i-m)^2 + sum_{j=1}^{n_i}left(x_{i,j}^2-m_i^2right) right]\ &= sum_{i=1}^N left[n_i(m_i-m)^2 + sum_{j=1}^{n_i}left(x_{i,j}^2-2x_{i,j}m_i + m_i^2right) right]\ &= sum_{i=1}^N left[n_i(m_i-m)^2 + sum_{j=1}^{n_i}left(x_{i,j}-m_iright)^2 right]\ &= sum_{i=1}^N left[n_i(m_i-m)^2 + (n_i-1)v_i right]. end{align}$$ All that remains is to divide both sides by $n-1$ and we are done.

Similar Posts:

Rate this post

Leave a Comment