Dear all,
I was encouraged to ask this question here as well as on stackoverflow and would be very appreciative of any answers…
Due to hetereoscedasticity I'm doing bootstrapped linear regression (appeals more to me than robust regression). I'd like to create a plot along the lines of what I've done in the script here. However the fill=int
is not right since int
should (I believe) be calculated using a bivariate normal distribution.
- Any idea how I could do that in this setting?
- Also is there a way for
bootcov
to return bias-corrected percentiles?
sample script:
library(ggplot2) library(Hmisc) library(Design) # for ols() o<-data.frame(value=rnorm(10,20,5), bc=rnorm(1000,60,50), age=rnorm(1000,50,20), ai=as.factor(round(runif(1000,0,4),0)), Gs=as.factor(round(runif(1000,0,6),0))) reg.s<-function(x){ ols(value~as.numeric(bc)+as.numeric(age),data=x,x=T,y=T)->temp bootcov(temp,B=1000,coef.reps=T)->t2 return(t2) } dlply(o,.(ai,Gs),function(x) reg.s(x))->b.list llply(b.list,function(x) x[["boot.Coef"]])->b2 ks<-llply(names(b2),function(x){ s<-data.frame(b2[[x]]) s$ai<-x return(s) }) ks3<-do.call(rbind,ks) ks3$ai2<-with(ks3,substring(ai,1,1)) ks3$gc2<-sapply(strsplit(as.character(ks3$ai), "\."), "[[", 2) k<-ks3 j<-dlply(k,.(ai2,gc2),function(x){ i1<-quantile(x$Intercept,probs=c(0.025,0.975))[1] i2<-quantile(x$Intercept,probs=c(0.025,0.975))[2] j1<-quantile(x$bc,probs=c(0.025,0.975))[1] j2<-quantile(x$bc,probs=c(0.025,0.975))[2] o<-x$Intercept>i1 & x$Intercept<i2 p<-x$bc>j1 & x$bc<j2 h<-o & p return(h) }) m<-melt(j) ks3$int<-m[,1] ggplot(ks3,aes(x=bc,y=Intercept,fill=int)) + geom_point(,alpha=0.3,size=1,shape=21) + facet_grid(gc2~ai2,scales = "free_y")+theme_bw()->plott plott<-plott+opts(panel.grid.minor=theme_blank(),panel.grid.major=theme_blank()) plott<-plott+geom_vline(x=0,color="red") plott+xlab("BC coefficient")+ylab("Intercept")
Best Answer
First I would like to point out that to get the object ks3
you do not need to do that much of the book-keeping code. Use the features of ddply
:
ks3 <- ddply(o,.(ai,Gs),function(d){ temp <- ols(value~as.numeric(bc)+as.numeric(age),data=d,x=T,y=T) t2 <- bootcov(temp,B=1000,coef.reps=T) data.frame(t2$boot.Coef,ai=d$ai[1],gc2=d$Gs[1]) })
Also scales=free
works much better if you want to see what kind of graph you intend to provide.
Concerning your first question, if you estimate the standard errors of coefficient using bootstrap, you are absolutely correct in using estimated percentiles. There is no need to use the percentiles of bivariate normal, since then you assume normality and bootstrap is used exactly for the purpose of not attaching oneself to some theoretical distribution.
For your second question, first you need to define what do you mean by bias-corrected percentiles. Furthermore your question implies that bootcov
returns non bias-corrected percentiles, but it does not return any percentiles at all.
Update
If we know that coefficients are normal we can use confidence ellipses. The mean and covariance matrix are estimated using bootstrap. The following code checks which of the observations of the bivariate sample falls into 0.95 confidence ellipse region:
confeps <- function(x, level=0.95, t=qchisq(level,2)) { cv <- cov(x) ch <- t(chol(x)) x <- sweep(x, 2, apply(x, 2, mean), "-") y <- solve(ch,t(x)) tt <- y[1, ]^2 + y[2, ]^2 tt < t }
Here is the example how this function works:
bs <- cbind(a <- rnorm(1000), 2-a/3+rnorm(1000)/5) mn <- apply(bs, 2, mean) require(ellipse) eps <- ellipse(cov(bs), centre=mn, npoints=1000) plot(eps, type="l", xlim=range(bs[,1]), ylim=range(bs[, 2])) col <- rep(1, nrow(x)) col[!confeps(bs)] <- 2 points(bs[, 1], bs[, 2], col=col)
Similar Posts:
- Solved – Differences in AUC calculation between pROC and ROCR
- Solved – Function to find the quantile in a vector corresponding to constant $x$
- Solved – Fitting an exponential model to data
- Solved – If any two variables follow a normal bivariate distribution does it also have a multivariate normal distribution
- Solved – Making boxplots of hourly data in R