Let's say I want to make a football simulator based on real-life data.

Say I have a player who averages 5.3 yards per carry with a SD of 1.7 yards.

I'd like to generate a random variable that simulates the next few plays.

eg: 5.7, 4.9, 5.3, etc.

What stats terms to I need to look up to pursue this idea? Density function? The normal curve estimates what boundaries the data generally fall within, but how do I translate that into simulation of subsequent data points?

Thanks for any guidance!

**Contents**hide

#### Best Answer

Of course you can use rnorm() in R, but it may be easier to understand how drawing from a pdf works by using the probability integral transform.

Basically, once we specify the structure of the pdf, we can transform this into a cdf (empirically, to ignore what the equation is), and because the values of the cdf have unique values from 0 to 1, we can back-calculate a draw from the original pdf by matching random draws from 0 to 1, with the cdf.

This way, you only need to have a RNG from 0 to 1, and the function of the pdf, and you're set. Here is the R code:

`x <- seq(-4, 4, len = 1000) f <- function(x, mu = 0, sigma = 1) { out <- 1 / sqrt(2*pi*sigma^2) * exp(-(x - mu)^2 / (2*sigma^2)) out } x.ecdf <- cumsum(f(x)) / sum(f(x)) out <- vector() y <- runif(100) for (i in 1:length(y)) { out[i] <- which((y[i] - x.ecdf)^2 == min((y[i] - x.ecdf)^2)) } par(mfrow = c(1,2)) plot(x, x.ecdf) hist(x[out], breaks = 20) `

alt text http://probabilitynotes.files.wordpress.com/2010/08/rnormish.png