# Solved – How should one define the sample variance for scalar input

I was horrified to find recently that Matlab returns \$0\$ for the sample variance of a scalar input:

``>> var(randn(1),0)   %the '0' here tells var to give sample variance ans =      0 >> var(randn(1),1)   %the '1' here tells var to give population variance ans =      0 ``

Somehow, the sample variance is not dividing by \$0 = n-1\$ in this case. R returns a NaN for a scalar:

``> var(rnorm(1,1)) [1] NA ``

What do you think is a sensible way to define the population sample variance for a scalar? What consequences might there be for returning a zero instead of a NaN?

edit: from the help for Matlab's `var`:

``VAR normalizes Y by N-1 if N>1, where N is the sample size.  This is an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. For N=1, Y is normalized by N.   Y = VAR(X,1) normalizes by N and produces the second moment of the sample about its mean.  VAR(X,0) is the same as VAR(X). ``

a cryptic comment in the m code for `var states:

``if w == 0 && n > 1     % The unbiased estimator: divide by (n-1).  Can't do this     % when n == 0 or 1.     denom = n - 1; else     % The biased estimator: divide by n.     denom = n; % n==0 => return NaNs, n==1 => return zeros end ``

i.e. they explicitly choose not to return a `NaN` even when the user requests a sample variance on a scalar. My question is why they should choose to do this, not how.

edit: I see that I had erroneously asked about how one should define the population variance of a scalar (see strike through line above). This probably caused a lot of confusion.

Contents

``is.nan(var(rnorm(1,1))) [1] FALSE ``