# Solved – How exactly does AIC penalize overfitting

I've been reading up alot on the AIC value for GLM models and it has come to my attention that pretty much all of my litterature claims that AIC penalizes the model with too many variables without mensioning what the penalty actually is.

Is there anyone here who cares to explain to me how AIC penalizes models with too many variables? How does AIC show that an arbitrary model with for example 3 explanatory variables is better than a model with say 7 variables?

Contents

The definition of AIC is

\$\$ mathrm{AIC} = 2k – 2 ln (hat{L}) \$\$

Where \$hat{L}\$ is the likelihood of the model and \$k\$ is the number of parameters. Lower AIC values indicate better fits – higher likelihoods with fewer parameters.

explain to me how AIC penalizes models with too many variables?

The \$2k\$ term in AIC means that the AIC will go up by 2 for every additional parameter estimated. This is how AIC penalizes models for adding extra terms.

How does AIC show that an arbitrary model with for example 3 explanatory variables is better than a model with say 7 variables?

When comparing a simple model to a complex model, the log likelihood of the complex model must be greater than the log likelihood of the simple model by at least the number of additional parameters for the AIC to go down, indicating that the more complex model is a better fit.

In practice, a rule of thumb is often used: if the change in AIC is less than 2, the difference in fit is negligible; if the change is more than 10 there is strong evidence in support of the model with lower AIC. Using the "strong evidence" threshold of 10, a more complex model would need to improve the log likelihood by at least 5 per additional parameter for the complexity to be justified.

Other metrics, such as corrected AIC (AICc), also take the number of observations into account. You can browse some highly-voted question on AIC for lots of interesting discussion.

Rate this post