Solved – p value vs prediction error

In a lot of fields (like medicine) to check if a variable is related to an output is controlled if the p-value of that variable in a regression model is significant.

For example:

> summary(glm.D93)  Call: glm(formula = counts ~ outcome + treatment, family = poisson())  Deviance Residuals:         1         2         3         4         5         6         7         8         9   -0.67125   0.96272  -0.16965  -0.21999  -0.95552   1.04939   0.84715  -0.09167  -0.96656    Coefficients:               Estimate Std. Error z value Pr(>|z|)     (Intercept)  3.045e+00  1.709e-01  17.815   <2e-16 *** outcome2    -4.543e-01  2.022e-01  -2.247   0.0246 *   outcome3    -2.930e-01  1.927e-01  -1.520   0.1285     treatment2   1.189e-15  2.000e-01   0.000   1.0000     treatment3   8.438e-16  2.000e-01   0.000   1.0000     --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  (Dispersion parameter for poisson family taken to be 1)      Null deviance: 10.5814  on 8  degrees of freedom Residual deviance:  5.1291  on 4  degrees of freedom AIC: 56.761  Number of Fisher Scoring iterations: 4  >  

Wouldn't be better instead of using this approach to try all the possible combinations of variables and choose the model with the lowest prediction error?
We can so assume that all the variables in such models are so relevant.

If this is not the case can you explain me why?

Your question is essentially about model selection.

When you are building a statistical model, you might not want to just consider the predictive ability of your model. Conventionally, the goodness of a statistical model is evaluated by the following three attributes.

  1. Parsimony or Interpretability, i.e., the simplicity of your model. A parsimonious model usually have better interpretations and many other advantages.

Everything should be made as simple as possible, but no simpler. – Albert Einstein

  1. Goodness-of-fit, i.e., how good your model fits the current data at hand.
  2. Generalizability, that is, the ability of the fitted model to describe or predict new unknown data.

Because of the above, many model selection criteria have been proposed to address the model selection problems in different aspects.

Above all, it should be pointed out that conducting variable selection solely based on the significance level (p value) of a variable can cause a lot of issues. The following is quoted from a report "Scientific method: Statistical errors" published in Nature. The paper addresses some serious problems in scientific research caused by the p-value criterion.

P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume. …… Perhaps the worst fallacy is the kind of self-deception for which psychologist Uri Simonsohn of the University of Pennsylvania and his colleagues have popularized the term P-hacking; it is also known as data-dredging, snooping, fishing, significance-chasing and double-dipping. “P-hacking,” says Simonsohn, “is trying multiple things until you get the desired result” — even unconsciously. …… “That finding seems to have been obtained through p-hacking, the authors dropped one of the conditions so that the overall p-value would be less than .05”, and “She is a p-hacker, she always monitors data while it is being collected.”

Similar Posts:

Rate this post

Leave a Comment