I'm interested in using the weighted bootstrap to correct for selection bias with a known form. I simulated a very simple example where the underlying data, $X$, are $N(0,1)$ and we are calculating a sample mean. However, there is selection bias such that every $X>0$ gets sampled, while only 5% of $X<0$ get sampled. Thus, the sample mean is biased.

To correct for this, I resampled with replacement using inverse-probability-of-selection weights. I did this using the package `boot`

with `weights`

equal to 1 or 20. I computed confidence intervals using either the percentile method or the raw empirical quantiles of the bootstraps. **Even though the resampled means were unbiased, both types of confidence interval had awful coverage (~53%).**

As a sanity check, I set the parameter that controls the ratio of selection probabilities (`eta`

below) to 1, such that there is no selection bias and we are just doing plain-vanilla resampling. Lo and behold, the coverage then is fine (96.5%). Since the bootstrap works for these data without selection, I don't think the excellent answers here hold (e.g., issues with non-pivotality, asymptotics, etc.).

Also, although I realize other methods of CI construction (e.g., BCa or basic) are sometimes better, I can't use them in this case because the original sample estimate is obviously biased so can't be used as a benchmark.

My code is below:

`##### Helper Fn: Extract CI Limits from boot.ci ##### # list with first entry for b and second entry for t2 # n.ests: how many parameters were estimated? get_boot_CIs = function(boot.res, type, n.ests) { bootCIs = lapply( 1:n.ests, function(x) boot.ci(boot.res, type = type, index = x) ) # list with first entry for b and second entry for t2 # the middle index "4" on the bootCIs accesses the stats vector # the final index chooses the CI lower (4) or upper (5) bound bootCIs = lapply( 1:n.ests, function(x) c( bootCIs[[x]][[4]][4], bootCIs[[x]][[4]][5] ) ) } ##### Helper Fn: Check CI coverage ##### covers = function( truth, lo, hi ) { return( (lo <= truth) & (hi >= truth) ) } sim.reps = 200 boot.reps = 500 n = 5000 # initial sample size before selection library(boot) # sanity check: using eta = 1 gives 96.5% coverage for both methods for ( i in 1:sim.reps ) { x = rnorm(n=5000) # sample values non-randomly # positive ones have an eta-fold higher chance of being sampled eta = 20 weight = rep(1, length(x)) weight[x < 0] = eta # indicator for whether we sample each observation keep = rbinom(n=n, size=1, prob=1/weight) d = data.frame(x, weight) d = d[ keep == 1, ] # this will be > 0 # mean(d$x) boot.res = boot( data = d, parallel = "multicore", R = boot.reps, weights = d$weight, statistic = function(original, indices) { b = original[indices,] mean(b$x) } ) # end call to boot() # in case there is some weird problem with # computing the boot CI tryCatch( { percCIs = get_boot_CIs(boot.res, "perc", n.ests = 1) }, error = function(err) { percCIs <<- list( c(NA, NA), c(NA, NA) ) } ) # quantiles qlo = quantile(boot.res$t[,1], 0.025) qhi = quantile(boot.res$t[,1], 0.975) rows = data.frame( Method = c( "PercBT", "JustQuantiles" ), # stats for mean estimate # note that both boot CI methods use the same point estimate XBarWtd = c( mean( boot.res$t[,1] ), mean( boot.res$t[,1] ) ), Lo = c( percCIs[[1]][1], qlo ), Hi = c( percCIs[[1]][2], qhi ), Coverage = c( covers( 0, percCIs[[1]][1], percCIs[[1]][2] ), covers( 0, qlo, qhi ) ) ) if ( i == 1 ) res = rows else res = rbind(res, rows) } # see the underwhelming results library(dplyr) vars = c("XBarWtd", "Lo", "Hi", "Coverage") res %>% group_by(Method) %>% summarise_at( vars(vars), mean ) `

And the disappointing results:

` Method XBarWtd Lo Hi Coverage <fct> <dbl> <dbl> <dbl> <dbl> 1 JustQuantiles 0.00430 -0.0333 0.0423 0.53 2 PercBT 0.00430 -0.0341 0.0430 0.535 `

**Contents**hide

#### Best Answer

The weights argument in the boot function is looking for resampling importance weights, not inverse probability weights. When no weights are specified, the bootstrap assumes an importance resampling weight of $n^{-1}$ for each unit, where $n$ is the sample size (i.e., uniform weighting). Importance weights are resampling probabilities that are intended to improve the efficiency of Monte Carlo estimates of bootstrap quantiles. See, e.g., this reference: "On importance resampling for the bootstrap" for a description of importance weights and their intended purpose. As to your problem, if you look at the boot function's source code, you will see the following:

`if (!is.null(weights)) weights <- t(apply(matrix(weights, n, length(R), byrow = TRUE), 2L, normalize, strata)) `

And digging a little deeper, you find that

`boot:::normalize `

normalizes the weights to add to 1. So, essentially, when you provide a vector of 1's and 20's, it becomes a vector of 1/sum(vector)'s and 20/sum(vector)'s. Boot then oversamples the 20/sum(vector)'s 20 times more than the 1/sum(vector)'s, so it is a happy coincidence that this is precisely the (weighted) correction needed to make your estimate of the mean unbiased. It is not, however, the intention of the weights argument, so it makes sense that other estimated quantities are not correct.

### Similar Posts:

- Solved – How many observations are taken for a pseudo-sample when bootstrapping
- Solved – Monte Carlo / Nonparametric confidence intervals for mean estimate
- Solved – Monte Carlo study to estimate bootstrap CI
- Solved – Bootstrapping in R using the boot {boot} and Boot {car}
- Solved – Two methods of using bootstraping to test the difference between two sample means