# Solved – Linear regresson lm or stepwise regression here using R

It is a basic question but I could not find clear answer on my reading. I am trying to find independent predictors of Infant.Mortality in data frame 'swiss' in R.

``> head(swiss)              Fertility Agriculture Examination Education Catholic Infant.Mortality Courtelary        80.2        17.0          15        12     9.96             22.2 Delemont          83.1        45.1           6         9    84.84             22.2 Franches-Mnt      92.5        39.7           5         5    93.40             20.2 Moutier           85.8        36.5          12         7    33.77             20.3 Neuveville        76.9        43.5          17        15     5.16             20.6 Porrentruy        76.1        35.3           9         7    90.57             26.6 ``

Following are the results using lm and I find only Fertility to be a significant predictor:

``> fit = lm(Infant.Mortality~., data=swiss) > summary(fit)  Call: lm(formula = Infant.Mortality ~ ., data = swiss)  Residuals:     Min      1Q  Median      3Q     Max  -8.2512 -1.2860  0.1821  1.6914  6.0937   Coefficients:               Estimate Std. Error t value Pr(>|t|) (Intercept)  8.667e+00  5.435e+00   1.595  0.11850 Fertility    1.510e-01  5.351e-02   2.822  0.00734    #  <<<< NOTE P VALUE HERE Agriculture -1.175e-02  2.812e-02  -0.418  0.67827 Examination  3.695e-02  9.607e-02   0.385  0.70250 Education    6.099e-02  8.484e-02   0.719  0.47631 Catholic     6.711e-05  1.454e-02   0.005  0.99634  Residual standard error: 2.683 on 41 degrees of freedom Multiple R-squared:  0.2439,    Adjusted R-squared:  0.1517  F-statistic: 2.645 on 5 and 41 DF,  p-value: 0.03665 ``

Following are the graphs:

``plot(fit) ``

On performing stepwise regression, following are the results:

``> step <- stepAIC(fit, direction="both");  Start:  AIC=98.34 Infant.Mortality ~ Fertility + Agriculture + Examination + Education +      Catholic                Df Sum of Sq    RSS     AIC - Catholic     1     0.000 295.07  96.341 - Examination  1     1.065 296.13  96.511 - Agriculture  1     1.256 296.32  96.541 - Education    1     3.719 298.79  96.930 <none>                     295.07  98.341 - Fertility    1    57.295 352.36 104.682  Step:  AIC=96.34 Infant.Mortality ~ Fertility + Agriculture + Examination + Education                Df Sum of Sq    RSS     AIC - Examination  1     1.320 296.39  94.551 - Agriculture  1     1.395 296.46  94.563 - Education    1     5.774 300.84  95.252 <none>                     295.07  96.341 + Catholic     1     0.000 295.07  98.341 - Fertility    1    72.609 367.68 104.681  Step:  AIC=94.55 Infant.Mortality ~ Fertility + Agriculture + Education                Df Sum of Sq    RSS     AIC - Agriculture  1     4.250 300.64  93.220 - Education    1     6.875 303.26  93.629 <none>                     296.39  94.551 + Examination  1     1.320 295.07  96.341 + Catholic     1     0.255 296.13  96.511 - Fertility    1    79.804 376.19 103.758  Step:  AIC=93.22 Infant.Mortality ~ Fertility + Education                Df Sum of Sq    RSS     AIC <none>                     300.64  93.220 - Education    1    21.902 322.54  94.525 + Agriculture  1     4.250 296.39  94.551 + Examination  1     4.175 296.46  94.563 + Catholic     1     2.318 298.32  94.857 - Fertility    1    85.769 386.41 103.017 >  >  > step\$anova Stepwise Model Path  Analysis of Deviance Table  Initial Model: Infant.Mortality ~ Fertility + Agriculture + Examination + Education +      Catholic  Final Model: Infant.Mortality ~ Fertility + Education              Step Df     Deviance Resid. Df Resid. Dev      AIC 1                                      41   295.0662 98.34145 2    - Catholic  1 0.0001533995        42   295.0663 96.34147 3 - Examination  1 1.3199421028        43   296.3863 94.55125 4 - Agriculture  1 4.2499886025        44   300.6363 93.22041 >  >  ``

Summary shows Education also has trend towards significant association:

``summary(step)  Call: lm(formula = Infant.Mortality ~ Fertility + Education, data = swiss)  Residuals:     Min      1Q  Median      3Q     Max  -7.6927 -1.4049  0.2218  1.7751  6.1685   Coefficients:             Estimate Std. Error t value Pr(>|t|) (Intercept)  8.63758    3.33524   2.590 0.012973 Fertility    0.14615    0.04125   3.543 0.000951 Education    0.09595    0.05359   1.790 0.080273  Residual standard error: 2.614 on 44 degrees of freedom Multiple R-squared:  0.2296,    Adjusted R-squared:  0.1946  F-statistic: 6.558 on 2 and 44 DF,  p-value: 0.003215 ``

What do I conclude? Is Education an important predictor or not?

Also, do the graphs using plot(fit) add any significant information?

Edit:
I ran shapiro test on all columns and found 2 are not normally distributed:

``Fertility : P= 0.3449466 (Normally distributed)  Agriculture : P= 0.1930223 (Normally distributed)  Examination : P= 0.2562701 (Normally distributed)  Education : P= 1.31202e-07 (--- NOT Normally distributed! ---)  Catholic : P= 1.20461e-07 (--- NOT Normally distributed! ---)  Infant.Mortality : P= 0.4978056 (Normally distributed)  ``

Does that make a difference?

Contents