I'm having trouble to replicate how extractAIC() is calculating the AIC for a linear model (lm). There are a lot of different formulas for the AIC out there, I've tried them all but none of them yielded the exact same value as extractAIC(). Here is what I've tried (extractAIC(finalModel) is 24823.52):
finalModel = step(lm(formula, data = myData), scope = scopeFormula ,direction="both", k=2) n = df.residual(finalModel) edf = length(finalModel$coefficients) DVVariance = var(companyData$MarketValueTotalFiscal) RSS = sum(resid(finalModel) ^ 2) #AIC formula using maximum likelyhood estimation (which is not used by extractAIC() with an lm object) -2*logLik(finalModel)+2*edf AIC(finalModel) #AIC formula using OLS, which extractAIC() should theoretically be using with lm according to help n*log(RSS/n)-n+n*log(2*pi) +2*edf #AIC formula Venables with known variance (of the DV of the poulation I guess) RSS/DVVariance + 2*edf #AIC formula according to Venables "Modern Applied Statistics" 2002 p. 174 with unknown variance n * log(RSS/n) + 2*edf # +constant, but he doesn't explain what constant is #AIC forumla accordint to Venables with the constant term according to Wikipedia about AIC n * log(RSS/n) + 2*edf + (n/2)*log(2*pi) #Comparison to the extractAIC() function that is used by the step() function (both are in the MASS package) extractAIC(finalModel)
Here is the output:
> #AIC formula using maximum likelyhood estimation (which is not used by extractAIC() with an lm object) > -2*logLik(finalModel)+2*edf 'log Lik.' 29738.72 (df=50) > AIC(finalModel) [1] 29740.72 > > #AIC formula using OLS, which extractAIC() should theoretically be using with lm according to help > n*log(RSS/n)-n+n*log(2*pi) +2*edf [1] 25582.46 > > #AIC formula Venables with known variance (of the DV of the poulation I think) > RSS/DVVariance + 2*edf [1] 274.651 > > #AIC formula according to Venables "Modern Applied Statistics" 2002 p. 174 with unknown variance > n * log(RSS/n) + 2*edf # +constant, but he doesn't explain what constant is [1] 24172.31 > > #AIC forumla accordint to Venables with the constant term according to Wikipedia about AIC > n * log(RSS/n) + 2*edf + (n/2)*log(2*pi) [1] 25718.88 > > #Comparison to the extractAIC() function that is used by the step() function (both are in the MASS package) > extractAIC(finalModel) [1] 49.00 24823.52
Best Answer
You can always check the source code since R is open-source:
> stats:::extractAIC.lm function (fit, scale = 0, k = 2, ...) { n <- length(fit$residuals) edf <- n - fit$df.residual RSS <- deviance.lm(fit) dev <- if (scale > 0) RSS/scale - n else n * log(RSS/n) c(edf, dev + k * edf) } <bytecode: 0x000000001f88adb8> <environment: namespace:stats>
This is also described in the documentation ?extractAIC
:
The criterion used is
AIC = - 2*log L + k * edf,
where
L
is the likelihood andedf
the equivalent degrees of freedom (i.e., the number of free parameters for usual parametric models) offit
.For linear models with unknown scale (i.e., for lm and aov),
-2 log L
is computed from the deviance and uses a different additive constant tologLik
and henceAIC
. If RSS denotes the (weighted) residual sum of squares thenextractAIC
uses for-2 log L
the formulaeRSS/s - n
(corresponding to Mallows' Cp) in the case of known scales
andn log (RSS/n)
for unknown scale.AIC
only handles unknown scale and uses the formulan*log(RSS/n) + n + n*log 2pi - sum(log w)
wherew
are the weights. FurtherAIC
counts the scale estimation as a parameter in theedf
andextractAIC
does not.
Similar Posts:
- Solved – Question on AIC and stepAIC
- Solved – the formula to calculate the scaled schoenfeld residuals for a coxPH model
- Solved – Appropriate residual degrees of freedom after dropping terms from a model
- Solved – regression and elastic net provide different results
- Solved – regression and elastic net provide different results