Solved – How should one define the sample variance for scalar input

I was horrified to find recently that Matlab returns $0$ for the sample variance of a scalar input:

>> var(randn(1),0)   %the '0' here tells var to give sample variance ans =      0 >> var(randn(1),1)   %the '1' here tells var to give population variance ans =      0 

Somehow, the sample variance is not dividing by $0 = n-1$ in this case. R returns a NaN for a scalar:

> var(rnorm(1,1)) [1] NA 

What do you think is a sensible way to define the population sample variance for a scalar? What consequences might there be for returning a zero instead of a NaN?

edit: from the help for Matlab's var:

VAR normalizes Y by N-1 if N>1, where N is the sample size.  This is an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. For N=1, Y is normalized by N.   Y = VAR(X,1) normalizes by N and produces the second moment of the sample about its mean.  VAR(X,0) is the same as VAR(X). 

a cryptic comment in the m code for `var states:

if w == 0 && n > 1     % The unbiased estimator: divide by (n-1).  Can't do this     % when n == 0 or 1.     denom = n - 1; else     % The biased estimator: divide by n.     denom = n; % n==0 => return NaNs, n==1 => return zeros end 

i.e. they explicitly choose not to return a NaN even when the user requests a sample variance on a scalar. My question is why they should choose to do this, not how.

edit: I see that I had erroneously asked about how one should define the population variance of a scalar (see strike through line above). This probably caused a lot of confusion.

Scalars can't 'have' a population variance although they can be single samples from population that has a (population) variance. If you want to estimate that then you need at least: more than one data point in the sample, another sample from the same distribution, or some prior information about the population variance by way of a model.

btw R has returned missing (NA) not NaN

is.nan(var(rnorm(1,1))) [1] FALSE 

Similar Posts:

Rate this post

Leave a Comment