# Solved – Selecting variables in multiple linear regression in R

Consider that we have a problem with 4 variables (y, x1, x2 and x3) and we want to do a multiple linear regression model. As we need to know which variables are the most important in the problem, we look for it with a step selection as (it's just an example, we could also used back, both…) :

``g0 = lm(Y~1,data=dat) gxf = formula(gx) forward=step(g0,scope=gxf,direction="forward",test="F") ``

Suppose that this function says to us that our model should be y ~ ax1 + bx3. If we now do a summary to the object "forward" and we get this:

``Coefficients:              Estimate Std. Error t value Pr(>|t|)     (Intercept)  0.071923   0.150266   0.479    0.636     X1           0.009716   0.001890   5.140 2.09e-05 *** X3          -0.013497   0.009230  -1.462    0.155     ``

Do we should change our model to y ~ x1? Why isn't significative x3? And in case we change to only y ~ x1, if we do a lm(y~x3) and in a summary of this model now x3 is also significative, what model is better? The one that have a better r^2?

Contents