Let's say I want to make a football simulator based on real-life data.
Say I have a player who averages 5.3 yards per carry with a SD of 1.7 yards.
I'd like to generate a random variable that simulates the next few plays.
eg: 5.7, 4.9, 5.3, etc.
What stats terms to I need to look up to pursue this idea? Density function? The normal curve estimates what boundaries the data generally fall within, but how do I translate that into simulation of subsequent data points?
Thanks for any guidance!
Best Answer
Of course you can use rnorm() in R, but it may be easier to understand how drawing from a pdf works by using the probability integral transform.
Basically, once we specify the structure of the pdf, we can transform this into a cdf (empirically, to ignore what the equation is), and because the values of the cdf have unique values from 0 to 1, we can back-calculate a draw from the original pdf by matching random draws from 0 to 1, with the cdf.
This way, you only need to have a RNG from 0 to 1, and the function of the pdf, and you're set. Here is the R code:
x <- seq(-4, 4, len = 1000) f <- function(x, mu = 0, sigma = 1) { out <- 1 / sqrt(2*pi*sigma^2) * exp(-(x - mu)^2 / (2*sigma^2)) out } x.ecdf <- cumsum(f(x)) / sum(f(x)) out <- vector() y <- runif(100) for (i in 1:length(y)) { out[i] <- which((y[i] - x.ecdf)^2 == min((y[i] - x.ecdf)^2)) } par(mfrow = c(1,2)) plot(x, x.ecdf) hist(x[out], breaks = 20)
alt text http://probabilitynotes.files.wordpress.com/2010/08/rnormish.png