I have 3 groups, split by highest degree of education. For each individual in a group, i calculate the variance of her income in a five year window. Let's for simplicity assume that i only do this once, for one base year, and the group size does not vary over time in that window.

I then average the individual variances over all indivudals in a group, to compare the mean average income variance of the groups in that time window.

My second education group amounts to 2/3rds of the whole sample. I suspect the possibilty, that there is an overestimation (or even underestimation?) of the average variance in that group compared to the others, because more individual variances are added into it.

How would i go about confirming or denying my suspicion that i need comparable sample sizes for a comparison of average variances?

The smaller groups are about 400 – 600 individuals large, the big group holds about 2000 individuals.

**Contents**hide

#### Best Answer

We want to know the extent to which the sample size of the groups affects the inter-group comparison of the mean income variabilities (accepting the CLT approximation).

Consider just one income group containing $n$ individuals, and let $X_1, …, X_n$ be their observed income variabilities over the specified period. We assume that they are identically distributed according to some arbitrary probability distribution that has mean $mu$ and variance $sigma^2$. We want to estimate $mu$ using the sample mean $bar X$.

Using the Central Limit Theorem, we find that $| bar X – mu | < frac{sigma t_v(alpha) }{sqrt n} $ with probability $1-alpha$, where $t_v$ is the function for the percentage points of the standard normal distribution. Plugging in $alpha = 5$:

**Your estimate $bar X$, for sample size $n$, is within $frac{2 sigma}{sqrt n} $ of the true mean $mu$ just over 95% of the time.**

To break this down, if you have a larger sample size for the second group, your estimate of the mean 'income variability' in that group is likely to be nearer to the true mean.

However the expected value of $bar X$ will always be $mu$ regardless of the group ($bar X$ is an unbiased estimator).

However $sigma^2$ may also vary from one income group to another (the variance of the variability!), which will also affect how closely your estimates cluster about the true mean for that income group. For example, if income group A contained those on $100k-$110k, but group B contained those from $10k all the way to $100k, you'd expect $sigma^2$ to be higher in group B because it encompasses a wider variety of people.

(Another approach would be to regress income variability against mean income.)