Solved – Are two sets of 1-D data are equal, if they have same mean and variance and same size

I have two datasets of size n. They have same mean and variance.
Is it possible that they have the same entries as their data?
What if I considered variance using L1 norm?

EDIT: By L1 norm I mean that, the mean of the deviation with the mean of the data.

Is it possible that they have the same entries as their data?

Certainly it's possible that they have the same entries, but you certainly can't assume the will be the same. It is not necessarily the case that two data sets with the same means and variances will be identical.

Indeed it's easy for it not to be the case, most notably with discrete variables.

Consider these samples of size 3:

Sample A:   1, 1, 4  Sample B:   0, 3, 3 

They have the same mean, variance and sample size but their entries have no values in common.

What if I considered variance using L1 norm?

Replacing the variance with the mean deviation from the median (i.e. the L1 norm about the minimizer of the L1 norm), it won't change the answer.

Edit: similarly for the mean deviation. [See the above example, which also has the same mean deviation in the two samples for an example where the mean and the mean deviation are the same but the observations differ.]

It's possible for two such samples of size $n$ to have the same entries, but some measure of center and spread being the same is not enough on its own to imply that the entries are the same. It doesn't matter which standard measure of center and spread you choose, specifying both will effectively reduce the degrees of freedom in the sample by 2, so as long as there are more than two observations it should be possible to make those values the same without the observations being the same.

Indeed you could make the means the same and the medians the same, and the variances and the mean deviations the same across samples … but if you had enough observations, the sample values could still be different. Here's an example:

  A  B   2  0     4  6     7  7    7  7    8  9    8 10   13 10  

For both these samples the mean is 7, the variance is 12, and the mean deviation 16/7 (and both medians are 7).

If you had a variable consisting only of 0's and 1's, knowing $n$ and the mean is enough to know the exact number of 0's and 1's (so you also know the variance and the mean deviation) … but it won't tell you anything about their order, so even then you couldn't tell they were the same (i.e. that $x_i=y_i$ for all $i$ from mean and variance (or mean deviation).

If you had continuous measurements that were recorded to some fairly large number of significant figures, and they had identical mean and variance (to that number of figures) … that wouldn't guarantee that the entries were all the same, but it would be a minor miracle for it to happen otherwise by chance (unless the variables had been standardized, in which case it would be quite unremarkable).

Similar Posts:

Rate this post

Leave a Comment