# Solved – Get p-value of coefficients in regression models using bootstrap

I've been reading a lot these last few days about how to get a p-value from bootstrap for regression models (not by permutation). For each coefficient of the model, the null hypothesis is that the coefficient equals 0 and H1 is that it is different to 0 (bilateral test).

The most noticeable similar subjects are the following two questions on stackexchange, but the answers confuse me a lot:

From what I've read, I noticed several approaches, but I can't figure out which is the valid one:

1. Should my bootstrap function return the test statistic calculated for each sample, or the estimate?

2. Should I calculate the proportion of the test statistic/estimate above 0 or above the point estimate of the base model?

3. Should I multiply the result by 2 because the test is bilateral or use absolute values?

Contents

Suppose we want to test the null hypothesis that a regression coefficient = 0 using bootstrap, and say we decide 0.05 to be the level of significance. Now, we can generate the sampling distribution for each coefficient using bootstrap. It is easy to check if 0 falls within 95% confidence interval, thus we can easily decide whether we can reject the null or not.

To get a p-value, we need to check what is the quantile value of 0 in the sampling distribution. (I am using a quantile based approach there are other methods to do it which can be found here Fox on Regression) After you get the quantile,Q, the p-value is 2*Q or 2*(1-Q) depending on whether Q > 0.5 or less than 0.5.

As an illustration of the approach, consider this

``library(faraway) ``

# Build linear model

``mdl <- lm(divorce ~ ., data = divusa) ``

# Bootstrap

``bootTest <- sapply(1:1e4,function(x){   rows <- sample(nrow(divusa),nrow(divusa),replace = T)   mdl <- lm(divorce ~ ., data = divusa[rows,])   return(mdl\$coefficients) }) ``

Here is the model

``summary(mdl)  Call: lm(formula = divorce ~ ., data = divusa)  Residuals:     Min      1Q  Median      3Q     Max  -2.9087 -0.9212 -0.0935  0.7447  3.4689   Coefficients:              Estimate Std. Error t value Pr(>|t|)     (Intercept) 380.14761   99.20371   3.832 0.000274 *** year         -0.20312    0.05333  -3.809 0.000297 *** unemployed   -0.04933    0.05378  -0.917 0.362171     femlab        0.80793    0.11487   7.033 1.09e-09 *** marriage      0.14977    0.02382   6.287 2.42e-08 *** birth        -0.11695    0.01470  -7.957 2.19e-11 *** military     -0.04276    0.01372  -3.117 0.002652 **  --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  Residual standard error: 1.513 on 70 degrees of freedom Multiple R-squared:  0.9344,    Adjusted R-squared:  0.9288  F-statistic: 166.2 on 6 and 70 DF,  p-value: < 2.2e-16 ``

Notice the p-values

Subset of regression coefficient generated using bootstrap

``bootTest[,1:5]                     [,1]          [,2]         [,3]         [,4]         [,5] (Intercept) 335.70970574 372.525260160 569.85830341 338.70069977 344.69261238 year         -0.18107568  -0.201798080  -0.30579380  -0.18125215  -0.18328105 unemployed   -0.02916575   0.006828023   0.01197723  -0.05610887  -0.11463230 femlab        0.79078784   0.842924808   1.02607863   0.77527548   0.76472406 marriage      0.17372382   0.199571033   0.18782967   0.15289119   0.15693996 birth        -0.11613752  -0.118507758  -0.11998122  -0.11666450  -0.13344442 military     -0.04051730  -0.056277118  -0.04062756  -0.05167556  -0.07251748 ``

Generate p-values with bootstrap

``pvals <- sapply(1:nrow(bootTest),function(x) {   distribution <- ecdf(bootTest[x,])   qt0 <- distribution(0)   if(qt0 < 0.5){         return(2*qt0)       } else {         return(2*(1-qt0))       } }) ``

Comparing p-values from t-test and bootstrap

### T test

``summary(mdl)\$coefficients[,4]  (Intercept)         year   unemployed       femlab     marriage        birth     military  2.744830e-04 2.966776e-04 3.621708e-01 1.085196e-09 2.419284e-08 2.191964e-11 2.652003e-03  ``

### Bootstrap

``> pvals  0.0008 0.0008 0.2196 0.0000 0.0000 0.0000 0.0188 ``

The highly significant p-values with coefficients < 1e-8 are all 0 with bootstrap with 1e4 iterations. Furthermore, the ranking of p-values is comparable as well.

Rate this post