I was horrified to find recently that Matlab returns $0$ for the sample variance of a scalar input:
>> var(randn(1),0) %the '0' here tells var to give sample variance ans = 0 >> var(randn(1),1) %the '1' here tells var to give population variance ans = 0
Somehow, the sample variance is not dividing by $0 = n-1$ in this case. R returns a NaN for a scalar:
> var(rnorm(1,1)) [1] NA
What do you think is a sensible way to define the population sample variance for a scalar? What consequences might there be for returning a zero instead of a NaN?
edit: from the help for Matlab's var
:
VAR normalizes Y by N-1 if N>1, where N is the sample size. This is an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. For N=1, Y is normalized by N. Y = VAR(X,1) normalizes by N and produces the second moment of the sample about its mean. VAR(X,0) is the same as VAR(X).
a cryptic comment in the m code for `var states:
if w == 0 && n > 1 % The unbiased estimator: divide by (n-1). Can't do this % when n == 0 or 1. denom = n - 1; else % The biased estimator: divide by n. denom = n; % n==0 => return NaNs, n==1 => return zeros end
i.e. they explicitly choose not to return a NaN
even when the user requests a sample variance on a scalar. My question is why they should choose to do this, not how.
edit: I see that I had erroneously asked about how one should define the population variance of a scalar (see strike through line above). This probably caused a lot of confusion.
Best Answer
Scalars can't 'have' a population variance although they can be single samples from population that has a (population) variance. If you want to estimate that then you need at least: more than one data point in the sample, another sample from the same distribution, or some prior information about the population variance by way of a model.
btw R has returned missing (NA) not NaN
is.nan(var(rnorm(1,1))) [1] FALSE
Similar Posts:
- Solved – When is the sample median a good estimator of the population mean
- Solved – When is the sample median a good estimator of the population mean
- Solved – Built-in var() function in R computes the quasi-variance
- Solved – Covariance in R vs definition
- Solved – Calculating the variance of sample, knowing the mean of population