Solved – Multiple regression p-value differs in summary and in pf() output used for extraction, in R

When fitting multiple variables to one outcome via the lm() function in R, summary(lm) gives me the p-values for individual regressors but not for the full model in an easily extractable (as in, just accessing fields) kind of way.

According to this question, it is possible to extract the p-value via summary(lm)$fstatistic by using the command:

pf(x$fstatistic[1],x$fstatistic[2],x$fstatistic[3],lower.tail=FALSE) 

However, while in the example linked this provides the same p-value as is printed, I get a different one:

> summary(model) # ... Residual standard error: 1.533 on 371 degrees of freedom   (555 observations deleted due to missingness) Multiple R-squared:  0.3364,    Adjusted R-squared:  0.2864  F-statistic: 6.718 on 28 and 371 DF,  p-value: < 2.2e-16 

and:

f = summary(model)$fstatistic > pf(f[1],f[2],f[3],lower.tail=F)        value  5.948007e-20  

What are possible reasons for these values to be different, and which one is the "right" one for the significance of the whole model?

The p-value calculated by print.summary.lm (use getAnywhere(print.summary.lm) to study the code) is rounded for floating point precision using format.pval.

2.2e-16 is the value of .Machine$double.eps, which is

the smallest positive floating-point number x such that 1 + x != 1

So, the rounding is not arbitrary, but for numerical reasons.

Similar Posts:

Rate this post

Leave a Comment