# Solved – In Bayesian Information Criterion (BIC), why does having bigger n get penalized

The Bayesian Information Criterion (BIC) is calculated with:

$$text{BIC} = frac{1}{n hat{delta}^2} Big(text{RSS} + ln(n) d hat{delta}^2 Big)$$

where RSS is residual sum of squares and delta squared is estimate of the variance of the error associated with each response measurement.

Q1. Why does having more sample size get penalized, when usually having bigger data sample size is always better than having few?

I have learned that having more sample data size is always better. For example, if you have more data samples, you will have smaller standard error, narrower confidence interval and smaller standard deviation.

But according to this BIC's formula, the statistical model with more sample data would get penalized, which means having less chance to get selected. It gets more obvious when BIC is compared to AIC. As AIC uses 2 instead of ln(n) in its formula, if the sample size n of the model is bigger than 7, that model has less chance to get selected when we use BIC as a way of choosing the optimal model. Why would the creator of BIC want to penalize the model with bigger number of sample size n?

Q2. Why does my textbook 'An Introduction to Statistical Learning' change the meaning of n to 'variable', when we have d, which stands for the number of predictors in the statistical model?

My books says as follows about BIC.

Notice that BIC replaces the $$2 d hat{delta}^2$$ used by Cp with a $$ln(n) d hat{delta}^2$$ term, where n is the number of observations. Since ln(n) >2 for any n>7, the BIC statistics generally places a heavier penalty on models with many variables, and hence results in the selection of smaller models than Cp. (p 212)

I cannot guess why the author of this book changed the meaning of n, from 'the number of observations (sample data points) to 'the number of variables'. Don't we already have the variable d, which shows the number of predictor varaibles plus intercept?

I would deeply appreciate if anyone here can answer my two questions. Thank you very much for reading!

Contents