Solved – Bootstrap vs other simulated data methods

In the mixed effect model, many statisticians would like to simulate or bootstrap data to create empirical confidence regions for fixed effect parameters and random effect parameters.

Resampling (ie bootstapping) seems intuitive for me because it makes few assumptions about the nature of the data.

As an alternative, some identify the multivariate distribution of a set of variables and draw at random from that distribution.

My question is: Is there a principle where one would decide between one of these approaches? Is one of them always better?

To bootstrap in a mixed effects linear model you would do sampling with replacement in a way that maintains the model structure. So your data is divided into groups and you don't want to mix the data from one group into the data from another. For any particular group say you have m observations then you would sample m times with replacement from those m observations. You repeat this process with all the other groups (but the value for m may change). Once you have done this you have a bootstrap sample. You fit the model to this bootstrap sample and then repeat the bootstrapping followed by model fitting many times. This will give you a collection of estimated model parameters (a histogram for each if you will). Any time you have a bootstrap histogram of estimates you can construct bootstrap confidence intervals from this collection of estimates. The simplest is Efron's percentile method which takes the 2.5 percentile and the 97.5 percentile from these ordered bootstrap estimate to be the endpoint of a 95% confidence interval. For more detail on this you can read Efron and Tibshirani's An Introduction to Bootstrap (1993) Chapman and Hall, my book Bootstrap Methods 2nd ed (2007) Wiley or the article by Efron and Tibshirani in Statistical Science (1986).

Now in the absence of data you may want to get an understanding of how the model works. then you can do simulation of the data and look at the results in a way similar to what I described for the bootstrap. The difference is that instead of sampling from the empirical distribution for the data you have to specify a distribution or distributions whenever you do the sampling.

Similar Posts:

Rate this post

Leave a Comment