# Solved – Why does weighted bootstrap have awful coverage even in toy example

I'm interested in using the weighted bootstrap to correct for selection bias with a known form. I simulated a very simple example where the underlying data, $$X$$, are $$N(0,1)$$ and we are calculating a sample mean. However, there is selection bias such that every $$X>0$$ gets sampled, while only 5% of $$X<0$$ get sampled. Thus, the sample mean is biased.

To correct for this, I resampled with replacement using inverse-probability-of-selection weights. I did this using the package `boot` with `weights` equal to 1 or 20. I computed confidence intervals using either the percentile method or the raw empirical quantiles of the bootstraps. Even though the resampled means were unbiased, both types of confidence interval had awful coverage (~53%).

As a sanity check, I set the parameter that controls the ratio of selection probabilities (`eta` below) to 1, such that there is no selection bias and we are just doing plain-vanilla resampling. Lo and behold, the coverage then is fine (96.5%). Since the bootstrap works for these data without selection, I don't think the excellent answers here hold (e.g., issues with non-pivotality, asymptotics, etc.).

Also, although I realize other methods of CI construction (e.g., BCa or basic) are sometimes better, I can't use them in this case because the original sample estimate is obviously biased so can't be used as a benchmark.

My code is below:

``##### Helper Fn: Extract CI Limits from boot.ci ##### # list with first entry for b and second entry for t2 # n.ests: how many parameters were estimated? get_boot_CIs = function(boot.res, type, n.ests) {   bootCIs = lapply( 1:n.ests, function(x) boot.ci(boot.res, type = type, index = x) )    # list with first entry for b and second entry for t2   # the middle index "4" on the bootCIs accesses the stats vector   # the final index chooses the CI lower (4) or upper (5) bound   bootCIs = lapply( 1:n.ests, function(x) c( bootCIs[[x]][[4]][4],                                              bootCIs[[x]][[4]][5] ) ) }   ##### Helper Fn: Check CI coverage ##### covers = function( truth, lo, hi ) {   return( (lo <= truth) & (hi >= truth) ) }   sim.reps = 200 boot.reps = 500 n = 5000  # initial sample size before selection library(boot)  # sanity check: using eta = 1 gives 96.5% coverage for both methods   for ( i in 1:sim.reps ) {      x = rnorm(n=5000)      # sample values non-randomly     # positive ones have an eta-fold higher chance of being sampled     eta = 20     weight = rep(1, length(x))     weight[x < 0] = eta      # indicator for whether we sample each observation     keep = rbinom(n=n, size=1, prob=1/weight)      d = data.frame(x, weight)     d = d[ keep == 1, ]      # this will be > 0     # mean(d\$x)        boot.res = boot( data = d,                         parallel = "multicore",                        R = boot.reps,                         weights = d\$weight,                        statistic = function(original, indices) {                           b = original[indices,]                          mean(b\$x)                        }                       )  # end call to boot()        # in case there is some weird problem with       #  computing the boot CI       tryCatch( {         percCIs = get_boot_CIs(boot.res, "perc", n.ests = 1)       }, error = function(err) {         percCIs <<- list( c(NA, NA), c(NA, NA) )       } )        # quantiles       qlo = quantile(boot.res$$t[,1], 0.025) qhi = quantile(boot.res$$t[,1], 0.975)      rows =     data.frame(                              Method = c( "PercBT",                                         "JustQuantiles" ),                              # stats for mean estimate                             # note that both boot CI methods use the same point estimate                             XBarWtd = c( mean( boot.res$$t[,1] ), mean( boot.res$$t[,1] ) ),                              Lo = c( percCIs[[1]][1],                                     qlo ),                              Hi = c( percCIs[[1]][2],                                     qhi ),                              Coverage = c( covers( 0, percCIs[[1]][1], percCIs[[1]][2] ),                                           covers( 0,                                                   qlo,                                                   qhi ) )     )      if ( i == 1 ) res = rows     else res = rbind(res, rows) }  # see the underwhelming results library(dplyr) vars = c("XBarWtd", "Lo", "Hi", "Coverage") res %>% group_by(Method) %>% summarise_at( vars(vars), mean ) ``

And the disappointing results:

``  Method        XBarWtd      Lo     Hi Coverage   <fct>           <dbl>   <dbl>  <dbl>    <dbl> 1 JustQuantiles 0.00430 -0.0333 0.0423    0.53  2 PercBT        0.00430 -0.0341 0.0430    0.535 ``
Contents

The weights argument in the boot function is looking for resampling importance weights, not inverse probability weights. When no weights are specified, the bootstrap assumes an importance resampling weight of $$n^{-1}$$ for each unit, where $$n$$ is the sample size (i.e., uniform weighting). Importance weights are resampling probabilities that are intended to improve the efficiency of Monte Carlo estimates of bootstrap quantiles. See, e.g., this reference: "On importance resampling for the bootstrap" for a description of importance weights and their intended purpose. As to your problem, if you look at the boot function's source code, you will see the following:

``if (!is.null(weights))    weights <- t(apply(matrix(weights, n, length(R),                              byrow = TRUE), 2L, normalize, strata)) ``

And digging a little deeper, you find that

``boot:::normalize ``

normalizes the weights to add to 1. So, essentially, when you provide a vector of 1's and 20's, it becomes a vector of 1/sum(vector)'s and 20/sum(vector)'s. Boot then oversamples the 20/sum(vector)'s 20 times more than the 1/sum(vector)'s, so it is a happy coincidence that this is precisely the (weighted) correction needed to make your estimate of the mean unbiased. It is not, however, the intention of the weights argument, so it makes sense that other estimated quantities are not correct.

Rate this post