So, lets compare two normal distributions

`Do this x times: runs <- 100000 a.samples <- rnorm(runs, mean = 5) b.samples <- rbeta(runs, mean = 0) mc.p.value <- sum(a.samples > b.samples)/runs `

The mc.p.values falling below our alpha (0.05) divided by x would then give the type1 error rate. Our H0 is a.samples >= b.samples. (Inspired by https://www.countbayesie.com/blog/2015/3/3/6-amazing-trick-with-monte-carlo-simulations)

But, I thought a montecarlo simulation had to follow the following steps:

Algorithm:

- Set up some distribution for the data, f() or f(θ), and some H0
- Repeat the following two steps many times: (a) Simulate a data set according to H0 (b) Calculate T(x) using the simulated data
- Add T(X) evaluated from the sample data
- Order all of the T(x)s
- p-value is the proportion of the T(x)s as extreme or more extreme than the one from the sample data

Therefore the first code snippet isn't a bona fide monte carlo simulation? and is the p-value valid, because, if you go to graph it, you don't get the expected 5% type1 error rate that one might expect for a statistical test.

**Contents**hide

#### Best Answer

This is a Monte Carlo simulation, though I doubt it's doing what you want, and really is of no practical use. What you are comparing is 10000 single sample studies and determining how many of these individual observed values are on average higher. So, it's probably better to conceptualize your code as the following less efficient code:

`runs <- 10000 result <- logical(runs) for(i in 1:runs){ a <- rnorm(1, mean = 0) b <- rnorm(1, mean = 0) result[i] <- a > b } mc.p.value <- mean(result) `

The above code, when the distributions are equal, should show that $a$ is higher than $b$ 50% of the time, but this type of result is essentially useless in practice because you would only have $N=2$, and statistical inferences are not applicable (i.e., there is no variance within each group available to quantify sampling uncertainty).

What you are missing is comparing two *summary statistics* of interest to determine their sampling properties, which is generally the topic of statistical inference and requires at least a minimum number of data points to quantify some form of sampling uncertainty.

As it stands this is generally a standard independent $t$-test set-up. So, you could compare a number of things, such as how often the first group mean is higher than the second, how often a $t$-test gives a $p < alpha$ result (or analogously, how often the $|t|$ ratio is greater than the cut-off), and so on.

E.g., If $n=20$ for each group and the distributions are equal in the population then

`runs <- 10000 n <- 20 alpha <- .05 result.p <- result.mean <- numeric(runs) for(i in 1:runs){ a <- rnorm(n, mean = 0) b <- rnorm(n, mean = 0) result.mean[i] <- mean(a) > mean(b) result.p[i] <- t.test(a, b)$p.value } mc.p.value <- mean(result.p < alpha) mc.a_gt_b.value <- mean(result.mean) `

Playing around with the other parameters, such as changing the first group mean to 1, will change the nature of the simulation (changing it from a Type I error simulation, as it is now, to a power study).