Using R, I am trying to simulate how the power of a Monte Carlo two-sample test for central tendency changes with sample size. However, my simulation results does not show power increasing with sample size which is clearly wrong.
Could someone please advise where I am going wrong? Below is the basic set-up.
The null hypothesis is:
$H_0=mu_X-mu_Y=0$
The alternative hypothesis is:
$H_1=mu_X>mu_Y=0$
The test statistic is:
$E(x)-E(y)$
For my simulation, I use two samples $Xsim N(mu_X, sigma_y)$, $Ysim N(mu_Y, sigma_Y)$ of size $n_X$ and $n_Y$ respectively. I run 100 simulations for every increase in the sample size of X.
Below is my coded simulation with comments:
mc.pvalue<-function(nx, ny, mu.x, mu.y, sig.x, sig.y){ x<-rnorm(nx, mu.y, sig.x) # Generate samples under H1: mu_x > mu_y y<-rnorm(ny, mu.x, sig.y) obs.diff<-mean(x)-mean(y) # Test-stat is observed diff. in means se.x<-sd(x)/sqrt((length(x))) # Calculate standard errors se.y<-sd(y)/sqrt((length(y))) count=0 # Set counter equal to zero for(i in 1:1000){ # Generate 1000 pairs of samples and calculate x1<-rnorm(nx, mu.y, se.x) # the difference in means between each pair. y1<-rnorm(ny, mu.y, se.y) # The samples have been generated under sim.diff<-mean(x1)-mean(y1) # H0: mu_x = mu_y if(sim.diff<= obs.diff){count=count+1} } count/1000 # Estimated p-value for test statistic } # Calculate 100 estimated p-values for the test statistic. Then find the proportion of times # reject hypothesis at level of significance 0.05 calc.pvalues<-function(nx, ny=10, mu.x=21, mu.y=20, sig.x=0.5, sig.y=0.25, n.sim=100){ pvalues<-replicate(n.sim, mc.pvalue(nx, ny, sig.x, sig.y, mu.x, mu.y)) sum(pvalues<0.05)/n.sim } chge.sample.size<-seq(2, 100, by=0.05) # Sample size x increasing from 1 to 100 result<-sapply(chge.sample.size, calc.pvalues) # Apply the calc.pvalues function to # estimate p-value for each sample size # of x plot(chge.sample.size, result) # Plot increasing sample size of X # against test power
When I an edit the code to calculate the power of a t-test, it works perfectly:
calc.pvalue.t<-function(nx, ny, sig.x, sig.y, mu.x, mu.y){ x<-rnorm(nx, mu.x, sig.x) y<-rnorm(ny, mu.y, sig.y) t.test(x, y, alternative="greater", paired=FALSE, conf.level=0.95)$p.value } calc.t<-function(nx, ny=10, mu.x=21, mu.y=20, sig.x=0.5, sig.y=0.25, n.sim=100, alpha=0.05){ p.values<-replicate(n.sim, calc.pvalue.t(nx, ny, sig.x, sig.y, mu.x, mu.y)) sum(p.values<alpha)/n.sim } chge.sample.size<-seq(2, 10, by=1) tpower1<-sapply(chge.sample.size, calc.t) plot(chge.sample.size, tpower1)
Something seems to be going wrong with the first part of my code when I try to do the Monte Carlo test
Best Answer
I think maybe the reason is that the empirical distribution is changed with every observation in the function "mc.pvalue". So the 100 p-values are calculated with different empirical distribution.