# Solved – How to combine subsets consisting of mean, variance, confidence, and number of sampled points used

I have a data set that has been divided into \$n\$ data subsets.

I am sampling from each of these subsets and getting a tuple consisting of mean, variance, confidence and number of sampled points used.

How can I combine these results?

I do not know how other than a simple function of the number of points and their averages. This wont take into account either the variance or the confidence of the score.

Contents

Let \$n_i, m_i, v_i\$ be the number of samples, observed mean, and variance in sample \$i\$. Let \$n, m, v\$ be similar figures for the combined data (sorry I redefined \$n\$ here).

\$\$m = frac{1}{n}sum_i n_i m_i\$\$.

Now for the variance:

\$\$v = frac{1}{n-1}sum_{i,j} (x_{i,j} – m)^2\$\$

with \$x_{i,j}\$ the \$j^{th}\$ observation of sample \$i\$ and \$j=1,2,ldots, n_i\$.

Play around a little:

\$\$(x_{i,j} -m)^2 = (x_{i,j} – m_i + m_i – m)^2 = (x_{i,j} -m_i)^2 + (m_i-m)^2 +2(x_{i,j}-m_i)(m_i-m)\$\$.

Terms \$(m_i-m)\$ can be factored out of the summation over \$j\$:

\$\$v = frac{1}{n-1}left[sum_i n_i(m_i-m)^2 + 2sum_i(m_i-m)sum_j(x_{i,j}-m_i) + sum_{i,j} (x_{i,j} – m_i)^2right]\$\$.

Since \$sum_j (x_{i,j}-m_i)=0\$, the middle term cancels out. So you're left with:

\$\$v=frac{1}{n-1}left[sum_i n_i(m_i-m)^2 + sum_i(n_i-1)v_iright]\$\$

Confidence intervals are obtained with \$m\$ and \$v\$. Is that what you were looking for ?

Rate this post