Solved – OLS regression results: p-values > 0.10, how to proceed

In the Python statsmodels documentation there is an example with the goal:

We want to know whether literacy rates (Literacy column) in the 85 French departments (Departments) are associated with per capita wagers on the Royal Lottery (Lottery) in the 1820s. We need to control for the level of wealth (Wealth) in each department, and we also want to include a series of dummy variables on the right-hand side of our regression equation to control for unobserved heterogeneity due to regional effects (Region; N, E, S, W to 0 or 1). The model is estimated using ordinary least squares regression (OLS).

OLS Regression Results ============================================================================== Dep. Variable:                Lottery   R-squared:                       0.338 Model:                            OLS   Adj. R-squared:                  0.287 Method:                 Least Squares   F-statistic:                     6.636 Date:                Tue, 02 Feb 2021   Prob (F-statistic):           1.07e-05 Time:                        07:07:06   Log-Likelihood:                -375.30 No. Observations:                  85   AIC:                             764.6 Df Residuals:                      78   BIC:                             781.7 Df Model:                           6                                          Covariance Type:            nonrobust                                          ===============================================================================                   coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------- Intercept      38.6517      9.456      4.087      0.000      19.826      57.478 Region[T.E]   -15.4278      9.727     -1.586      0.117     -34.793       3.938 Region[T.N]   -10.0170      9.260     -1.082      0.283     -28.453       8.419 Region[T.S]    -4.5483      7.279     -0.625      0.534     -19.039       9.943 Region[T.W]   -10.0913      7.196     -1.402      0.165     -24.418       4.235 Literacy       -0.1858      0.210     -0.886      0.378      -0.603       0.232 Wealth          0.4515      0.103      4.390      0.000       0.247       0.656 ============================================================================== Omnibus:                        3.049   Durbin-Watson:                   1.785 Prob(Omnibus):                  0.218   Jarque-Bera (JB):                2.694 Skew:                          -0.340   Prob(JB):                        0.260 Kurtosis:                       2.454   Cond. No.                         371. ==============================================================================  

Prob (F-statistic), 1.07e-05, thus reject null hypothesis (H0: all coefficients are equal to zero), so there is statistically significant evidence that there is a relationship between dependent and independent variables together. But only Wealth has a p-value < 0.05.

Should the model be used as is? Or should all independent variables except Wealth be removed? What should be done based on the goal "We want to know whether literacy … We need to control for the level of wealth (Wealth) in each department …"?

Assuming that there are no problems with model assumptions, the model should be used as it is. Insignificant variables should not be removed. Removing them would invalidate any tests that are run within the reduced models. (Removing insignificant variables seems to be a common practice, but that doesn't make it better. Occasionally there are reasons such as removing variables that are potentially expensive to observe in the future when using the model for prediction, or that the number of observations is too small for fitting a full model with reasonable reliability, but I don't see such reasons here; even in such cases there are often better criteria than significance.)

Similar Posts:

Rate this post

Leave a Comment