# Solved – Estimating population variance through simulation in R

I want to estimate the variance of the exponential distribution with a rate of \$lambda=0.2\$.

I'm drawing a sample of 5 exponentials 1000 times, and know that the theoretical variance of my 'population' should be:
\$\$
sigma^2=frac{1}{lambda^2}=frac{1}{0.2^2} = 25
\$\$
From my understanding, dividing the deviations from the mean by the sample size will give a biased estimate of the population variance, even if the calculation will technically provide the variance of the individual sample. Instead, I should be able to get an unbiased estimate of the variance by accounting for the n-1 degrees of freedom:
\$\$
s^2 = sumlimits_{i=1}^n frac{(x_i-bar{x})^2}{N-1}
\$\$
I now want to explore this bias in R by running two simulations, one accounting for the reduced degrees of freedom and another in which the sum is only divided by \$N\$.

``variance <- NULL for (i in 1:1000) {     variance <- c(variance, sum((rexp(5, 0.2) - mean(rexp(5, 0.2)))^2)/(5-1)) } mean(variance) ``

However, my supposedly unbiased estimator, accounting for the fewer degrees of freedom, ends up being consistently more biased than the uncorrected version, which is expected to be biased.

For example with `set.seed(200)`, I'm getting the following estimates of the population variance:

Dividing by `n-1: 35.64`
Dividing by `n: 28.51`

By comparison, I get much better results using R's built-in `var()` function. However, I would like to write out as many of the calculations by hand as possible, to help improve my understanding.

Am I misunderstanding the theory or am I misunderstanding the functions in my R code (or both)?

Contents

``set.seed(200) B          <- 10000 varianceN  <- vector(length=B) varianceN1 <- vector(length=B) varianceP  <- vector(length=B) for (i in 1:B) {   data          <- rexp(5, 0.2)   varianceN[i]  <- sum((data-mean(data))^2) / (5)   varianceN1[i] <- sum((data-mean(data))^2) / (5-1)   varianceP[i]  <- sum((data-5)^2)          / (5) } # N.b., the theoretically correct value for the population variance is 25 mean(varianceN)   #  19.85737 mean(varianceN1)  #  24.82172 mean(varianceP)   #  24.85525 ``