I have a model of a bernoulli random process I fit using JAGS via the rjags
package in R. Here are some example data, as well as code to fit the given models in JAGS via rjags
:
#generate some binary 0/1 data set.seed(666) x1 = rnorm(1000) # some continuous variables x2 = rnorm(1000) z = 1 + 2*x1 + 3*x2 # linear combination with a bias pr = 1/(1+exp(-z)) # pass through an inv-logit function y = rbinom(1000,1,pr) # bernoulli response variable #fit a JAGS model. require(rjags) jags.model = " model { for (i in 1:N){ y[i] ~ dbern(p[i]) p[i] <- 1 / (1 + exp(-z[i])) z[i] <- a + b*x1[i] + c*x2[i] } a ~ dnorm(0, .0001) b ~ dnorm(0, .0001) c ~ dnorm(0, .0001) } " #setup data as list data = list(y=y, x1=x1, x2=x2, N = length(y)) #run JAGS model j.model <- jags.model(file = textConnection(jags.model), data=data, n.chains=3) #sample from the posterior jags.out <- coda.samples (model = j.model, variable.names = c('a','b','c'), n.iter = 1000)
To prove the model fits, here is some code to generate predicted vs. observed of the z variable (probability of being 0 or 1).
#grab coefficients, compare fitted vs observed of z to prove this fits. cf <- summary(jags.out)$statistics[,1] fitted <- cf[1] + cf[2]*x1 + cf[3]*x2 plot(z ~ fitted, pch=16, cex = 0.2, ylab='observed',xlab='fitted')
Now that the model fits, What I would like to do is plot the relationship between one of the predictor variables, say x1
, and the probabilty of the 0/1 outcome (the prediction for z
), as well as +/- the uncertainty on this prediction. Getting the mean prediction across the range of x1
values is fairly straightforward using the model coefficients:
#okay, not plot the relationship between x1 and z, holding x2 constant x1.range <- seq(min(x1),max(x1), by = (max(x1)-min(x1))/100) z.fit <- cf[1] + x1.range*cf[2] + mean(x2) * cf[3] #plot relationship plot(z.fit~ x1.range, cex=0) lines(smooth.spline(z.fit~x1.range), lwd=2, col='purple')
My question: How do I estimate the uncertainty associated with this relationship? An estimate of the standard deviation or the variance would be fine. Anything that would allow me to shade a confidence interval on this figure, given the model I have fit.
Best Answer
I think the best way to generate uncertainty values around each prediction is to send the desired predictions through the original JAGS model. This way, the model will sample from the distributions of all parameters, regardless of whether they are positive or negative, and generate a distribution around each prediction. This distribution will reflect the uncertainty associated with all parameters, at the value of each predictor of interest. Aspects of this distribution can then be used for plotting. To implement this in R, first generate a mean of x2
, and a range of x1
values for which you want to make predictions for. Here I am going to predict over the entire range of x1
values in the data, divided into 101 intervals across the range of x1
values.
x1.range <- seq(min(x1),max(x1), by = (max(x1)-min(x1))/100) x2.m <- mean(x2)
I then modify the JAGS model to make predictions.
jags.model = " model { #model code for (i in 1:N){ y[i] ~ dbern(p[i]) p[i] <- 1 / (1 + exp(-z[i])) z[i] <- a + b*x1[i] + c*x2[i] } #predictions over x1, holding x2 constant at its mean value for (i in 1:N.sim){ z.x1[i] <- a + b*x1.range[i] + c*x2.m } a ~ dnorm(0, .0001) b ~ dnorm(0, .0001) c ~ dnorm(0, .0001) } "
Make sure to add the new data to your data list:
#setup data as list data = list(y=y, x1=x1, x2=x2, x1.range = x1.range, x2.m = x2.m, N = length(y), N.sim = length(x1.range))
Compile the model, and the use the coda
function to sample from the prediction posterior:
#compile the model require(rjags) j.model <- jags.model(file = textConnection(jags.model), data=data, n.chains=3) #sample from the prediction posterior x1.jags.out <- coda.samples(model=j.model, variable.names = c('z.x1'),n.iter=1000)
This will generate a mean and standard deviation for each predicted value of y
, at each condition of x1
and x2
simulated, taking the full distribution and uncertainty of all parameter values into account. Finally, for plotting, grab the estimated mean and standard deviation (or whatever your favorite metric of variance is), and plot:
#generate plot, plotting mean and +/- 1 standard deviation. plot(x1.pred ~ x1.range, cex=0) lines(smooth.spline(x1.pred~ x1.range), lwd=2, col='purple') lines(smooth.spline((x1.pred+x1.pred.sd) ~ x1.range), lty=2, col='purple') lines(smooth.spline((x1.pred-x1.pred.sd) ~ x1.range), lty=2, col='purple') #shade error region polygon(c(x1.range,rev(x1.range)) ,c(x1.pred+x1.pred.sd,rev(x1.pred - x1.pred.sd)) ,col=adjustcolor('purple',0.3) ,lty=0 )
If specific probabilities are desired, then these
z
values need to be inverse logit transformed.
Similar Posts:
- Solved – Calculating prediction variance from JAGS model of a bernoulli outcome in R
- Solved – Calculating prediction variance from JAGS model of a bernoulli outcome in R
- Solved – rjags does not seem to use initial values specified
- Solved – Priors for Truncated Parameters – RJAGS
- Solved – Multi-level Bayesian hierarchical regression using rjags