# Solved – How to calculate the Bayesian or Schwarz Information Criterion (BIC) for a multilevel bayesian model

The BIC is defined as (according to wikipedia)

\$BIC = kln(n) – 2ln(hat{L})\$

where the likelihood \$hat{L} = p(x|hat{theta},M)\$ where \$M\$ is the model, \$x\$ are the data, and \$hat{theta}\$ are the to-be-inferred parameters of the model, set at their highest-likelihood point. Although the question could similarly apply for the AIC.

In my case, I am using MCMC to estimate the posterior distribution:

\$p(theta|x,M)propto p(x|theta,M)p(theta)\$

However, the issue in a multilevel model is that my model parameters \$theta\$ are split into two blocks, \$theta=theta_1,theta_2\$. Such that only \$theta_1\$ have a direct conditional relation with \$x\$:

\$hat{L} = p(x|theta,M)=p(x|theta_1,theta_2,M)=p(x|theta_1,M)\$

And instead, \$theta_2\$ (I believe they would be called the hyperparameters) parameterise the distribution of \$theta_1\$:

\$p(theta|M)=p(theta_1,theta_2|M)=p(theta_1|theta_2,M)P(theta_2|M)\$

which happens to not be contained in the likelihood function I defined, but instead in what you might call the prior \$p(theta)\$, the way I wrote it above. Since the likelihood function \$p(x|theta,M)\$ then doesn't contain what is probably the crucial part of my bayesian model, I feel like there must be something wrong.

So I have two questions:

1) I feel like I need to include the "forward model" for \$theta_1|theta_2,M\$ in the calculation of the BIC. So does that mean that I should define the likelihood function in the BIC as \$hat{L}=p(x|hat{theta_1},M)p(hat{theta_1}|hat{theta_2},M)\$ instead of just \$p(x|hat{theta_1},M)\$? And should it be the highest likelihood point of \$theta_1\$ or be marginalised over \$theta_1\$:

\$hat{L}=p(x|hat{theta_2},M)=int_{theta_1} p(x|theta_1,M)p(theta_1|hat{theta_2},M)dtheta_1\$

To me, the latter seems like the most logical solution. Technically I could have removed the \$theta_1\$ "level" of the hierarchical model by directly incorporating the above integral into some likelihood \$p(x|theta_2,M)\$ without changing the results in \$theta_2\$, turning it into a two-level model.

2) What exactly is "the highest likelihood point" in a Bayesian model? Is the point \$(theta_1,theta_2)\$ with the largest value \$p(x|theta_1,theta_2,M)\$ (or however we choose to define the likelihood, see above), or is it the highest probability posterior \$p(theta_1,theta_2|x,M)\$ which will be similar but also includes the priors?

BONUS QUESTION: Are \$theta_2\$ really called the hyperparameters, or are they just model parameters and would the hyperparameters be the ones that parametrise the prior \$p(theta_2|M)\$?

Contents