Solved – Is sample kurtosis hopelessly biased

I am looking at the sample kurtosis of a fairly skewed random variable, and the results seem inconsistent. To simply illustrate the problem, I looked at the sample kurtosis of a log-normal RV. In R (which I am slowly learning):

library(moments);   samp_size = 2048; n_trial = 4096;  kvals <- rep(NA,1,n_trial); #preallocate for (iii in 1:n_trial) {     kvals[iii] <- kurtosis(exp(rnorm(samp_size))); } print(summary(kvals)); 

The summary I get is

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    11.87   28.66   39.32   59.17   61.70 1302.00  

According to Wikipedia, the kurtosis for this log-normal RV should be around 114. Clearly the sample kurtosis is biased.

Doing some research I found that sample kurtosis is biased for small sample sizes. I used the 'G2' estimator as provided by the e1071 package in CRAN, and got very similar results for this sample size.

The question: which of the following characterize what is going on:

  1. The standard error of the sample kurtosis is simply very large for this RV (even though the hand-wavey common estimate of the standard error is of order $1/sqrt{n}$). Alternatively, I used too few samples (2048) in this study.
  2. These implementations of sample kurtosis suffer from numerical problems which might be corrected by e.g. Terriberry's method (in much the same way that Welford's method gives better results than the naive method for sample variance).
  3. I computed the population kurtosis incorrectly. (ouch)
  4. Sample kurtosis is inherently biased and you can never fix it for such small sample sizes.

There's a bias correction. It's not huge. I believe the sampling variance of the kurtosis is proportional to the eighth (!) central moment, which can be enormous for a lognormal distribution. You would need millions of trials (or far more) in a simulation to detect bias unless the CV is tiny. (Plot a histogram of kvals to see how extraordinarily skewed they are.)

The correct kurtosis is indeed about 113.9364.

As far as R style goes, it can be convenient to encapsulate the simulation in a function so you can easily modify the sample size or number of trials.

Similar Posts:

Rate this post

Leave a Comment