Solved – general method for simulating data from a formula or analysis available

De novo simulation of data from an experimental design data frame.
With a focus on R (though other language solution would be great).

In designing an experiment or a survey, simulating data and conducting an analysis on this simulated data can provide terrific insight into advantages and weaknesses of the design.

Such an approach can also be essential to the understanding and proper use of statistical tests.

However, this process tends to be somewhat tedious and many are led to skip past this important step in an experiment or survey.

Statistical models and test contain most of the information required to simulate the data (including an assumption or an explicit statement of the distribution).

Given an an analysis model (and its associated assumptions eg. normality and balance), the levels of a factor and a measure of significance (such as p-value), I would like to obtain simulated data (ideally with a generalized function akin to print(), predict(), simulate()).

Is such a generalized simulation framework possible?

If so, is such a framework currently available?

Example, I would like a function, such as:

 sim(aov(response~factor1+factor2*factor3),           p.values=list(factor1=0.05,                         factor2=0.05,                         factor3=0.50,                         factor2:factor3=0.05),           levels=list(factor1=1:10,                       factor2=c("A", "B", "C"),                       factor3=c("A", "B", "C"))) 

ie, a generalized version of:

sim.lm<-function(){ library(DoE.base) design<,3,3),                    factor.names=c("factor1", "factor2", "factor3"),                    replications=3,                    randomize=F)  response<-with(design, as.numeric(factor1)+                       as.numeric(factor2)+                       as.numeric(factor3)+                       as.numeric(factor2)*as.numeric(factor3)+                       rnorm(length(factor1)))  simulation<-data.frame(design, response)} 


sim(glm(response~factor1+factor2*factor3, family=poisson),          p.values=list(factor1=0.05,                        factor2=0.05,                        factor3=0.50,                        factor2:factor3=0.05),          levels=list(factor1=1:10,                      factor2=c("A", "B", "C"),                      factor3=c("A", "B", "C"))) 


  library(lme4)   sim(lmer(response~factor1+factor2 + (factor2|factor3)),            F_value=list(factor1=50,                         factor2=50),            levels=list(factor1=1:10,                        factor2=c("A", "B", "C"),                        factor3=c("A", "B", "C"))) 

that would create a complete corresponding data.frame

potential examples of specific functions (please edit at will)
– arima.sim

function exist to create a data.frame of the factor levels, without the modelled response:

There actually is an S3 generic simulate that even returns the data frame (or other list) you want. Type


It has methods for classes lm (works also for glm or for your aov example) and glm.nb (in MASS) already. You can now write S3 simulate methods for other classes of objects, e.g. for objects from lme4. You can check for which classes there are methods by typing

getAnywhere("simulate.class"), getAnywhere("simulate")   


getS3method("simulate","class"), methods(simulate)  

Similar Posts:

Rate this post

Leave a Comment