# Solved – find examples of Takeuchi Information Criterion (TIC) at work

I have been looking for examples of the TIC and couldn't find any. In particular I would like to know how exactly do you estimate the penalty term in TIC. That term consists of, as I found it somewhere, score function and Fisher information.
Are there any online resources where I can find that?

Contents

Here is an example found from the book Information Criteria and Statistical Modeling (pages 61-64) with my own minor changes and corrections to the presentation:

\$text{TIC}\$ for normal model

Let data samples \$x_1, x_2, …, x_n\$ be generated from a true distribution \$g(x)\$. We aim to estimate \$g(x)\$ by using a normal model:

\$\$f(x|mu,sigma^2) = frac{1}{sqrt{2pisigma^2}}expleft(-frac{(x-mu)^2}{2sigma^2}right).hspace{3cm}(1)\$\$

We estimate the parameters \$mu, sigma^2\$ with maximum likelihood (which is required by \$text{TIC}\$ to my understanding), so our estimator model is:

\$\$f(x|hat{mu},hat{sigma}^2) = frac{1}{sqrt{2pihat{sigma}^2}}expleft(-frac{(x-hat{mu})^2}{2hat{sigma}^2}right),\$\$

where \$hat{mu}=n^{-1}sum_{i=1}^n x_i\$ and \$hat{sigma}^2=n^{-1}sum_{i=1}^n (x_i-hat{mu})^2\$.

The \$text{TIC}\$ for the given model \$f(x|boldsymboltheta)\$ (where \$boldsymboltheta =(mu, sigma^2)\$ now) is defined as:

\$\$text{TIC}=-2sum_{i=1}^n log f(x_i|boldsymboltheta) + 2text{tr}(I(boldsymboltheta)J(boldsymboltheta)^{-1}),\$\$

where \$b=text{tr}(I(boldsymboltheta)J(boldsymboltheta)^{-1})\$ is the bias term and the \$dtimes d\$ (\$d\$ is dimensionality of the parameter vector, here we have \$d=2\$) matrices \$I(boldsymboltheta)\$ and \$J(boldsymboltheta)\$ are defined as:

\$\$I(boldsymboltheta)=E_g left[frac{partiallog f(X|boldsymboltheta)}{partial boldsymboltheta}frac{partiallog f(X|boldsymboltheta)}{partial boldsymboltheta^T}biggrrvert_{boldsymboltheta=boldsymboltheta}right],hspace{2cm}(2)\$\$

\$\$J(boldsymboltheta)=-E_g left[frac{partial^2 log f(X|boldsymboltheta)}{partialboldsymboltheta partialboldsymboltheta^T}biggrrvert_{boldsymboltheta=boldsymboltheta}right].hspace{4cm}(3)\$\$

In the above, tr\$(cdot)\$ stands for matrix trace and \$E_g\$ is expectation with respect to distribution \$g(x)\$. Lets now proceed to calculate the matrices \$I(boldsymboltheta)\$ and \$J(boldsymboltheta)\$ given our model \$f(x|mu,sigma^2)\$ in \$(1)\$. First, we need to calculate the log-likelihood function:

\$\$log f(x|boldsymboltheta)=-frac{1}{2}logleft(2pisigma^2right)-frac{(x-mu)^2}{2sigma^2}.\$\$

The partial derivatives of the log-likelihood function are:

\$\$frac{partial log f(x|boldsymboltheta)}{partialmu}=frac{x-mu}{sigma^2},hspace{1cm}frac{partial log f(x|boldsymboltheta)}{partialsigma^2}=-frac{1}{2sigma^2}+frac{(x-mu)^2}{2sigma^4}\$\$

\$\$frac{partial^2 log f(x|boldsymboltheta)}{partialmu^2}=-frac{1}{sigma^2},hspace{1cm}frac{partial^2 log f(x|boldsymboltheta)}{partial(sigma^2)^2}=frac{1}{2sigma^4}-frac{(x-mu)^2}{sigma^6}\$\$

\$\$frac{partial^2 log f(x|boldsymboltheta)}{partialmupartial sigma^2}=frac{partial^2 log f(x|boldsymboltheta)}{partialsigma^2partialmu}=-frac{x-mu}{sigma^4}.\$\$

We therefore have the corresponding \$2times 2\$ matrices:

\$\$begin{aligned}I(boldsymboltheta)&=E_gleft[begin{pmatrix} frac{partial log f(X|boldsymboltheta)}{partialmu} \ frac{partial log f(X|boldsymboltheta)}{partialsigma^2} end{pmatrix}begin{pmatrix} frac{partial log f(X|boldsymboltheta)}{partialmu} & frac{partial log f(X|boldsymboltheta)}{partialsigma^2} end{pmatrix}right]\ &=E_gleft[begin{pmatrix} frac{X-mu}{sigma^2} \ -frac{1}{2sigma^2}+frac{(X-mu)^2}{2sigma^4} end{pmatrix}begin{pmatrix} frac{X-mu}{sigma^2} & -frac{1}{2sigma^2}+frac{(X-mu)^2}{2sigma^4} end{pmatrix}right] \ &=E_gleft[begin{matrix} frac{(X-mu)^2}{sigma^4} & -frac{x-mu}{2sigma^4}+frac{(X-mu)^3}{2sigma^6} \ -frac{X-mu}{2sigma^4}+frac{(X-mu)^3}{2sigma^6} & frac{1}{4sigma^4}-frac{(X-mu)^2}{2sigma^6}+frac{(X-mu)^4}{4sigma^8} end{matrix}right] \ &= begin{bmatrix} frac{1}{sigma^2} & frac{mu_3}{2sigma^6} \ frac{mu_3}{2sigma^6} & frac{mu_4}{4sigma^8}-frac{1}{4sigma^4} end{bmatrix}, end{aligned}\$\$

\$\$begin{aligned}J(boldsymboltheta) &= -E_gbegin{bmatrix}frac{partial^2 log f(X|boldsymboltheta)}{partialmu^2} & frac{partial^2 log f(X|boldsymboltheta)}{partialmupartialsigma^2} \ frac{partial^2 log f(X|boldsymboltheta)}{partialsigma^2partialmu} & frac{partial^2 log f(X|boldsymboltheta)}{partial(sigma^2)^2}end{bmatrix} \ &= E_gbegin{bmatrix}frac{1}{sigma^2} & frac{X-mu}{sigma^4} \ frac{X-mu}{sigma^4} & frac{(X-mu)^2}{sigma^6}-frac{1}{2sigma^4}end{bmatrix}\ &= begin{bmatrix}frac{1}{sigma^2} & 0 \ 0 & frac{1}{2sigma^4}end{bmatrix},end{aligned}\$\$

where \$mu_j=E_gleft[(X-mu)^jright]\$ is the \$j\$th-order centralized moment. We then have:

\$\$I(boldsymboltheta)J(boldsymboltheta)^{-1}=begin{bmatrix} frac{1}{sigma^2} & frac{mu_3}{2sigma^6} \ frac{mu_3}{2sigma^6} & frac{mu_4}{4sigma^8}-frac{1}{4sigma^4} end{bmatrix}begin{bmatrix}sigma^2 & 0 \ 0 & 2sigma^4end{bmatrix}=begin{bmatrix}1 & frac{mu_ 3}{sigma^2}\ frac{mu_3}{2sigma^4} & frac{mu_4}{2sigma^4}-frac{1}{2}end{bmatrix},\$\$

and therefore the bias term is:

\$\$b=text{tr}left(I(boldsymboltheta)J(boldsymboltheta)^{-1}right) = 1 + frac{mu_4}{2sigma^4}-frac{1}{2}=frac{1}{2}left(1+frac{mu_4}{sigma^4}right).hspace{2cm}(4)\$\$

Thus, by now plugging in the maximum likelihood estimator \$hat{boldsymboltheta}=(hat{mu}, hat{sigma}^2)\$, we get the estimator for the bias term:

\$\$hat{b}=text{tr}left(I(hat{boldsymboltheta})J(hat{boldsymboltheta})^{-1}right)=frac{1}{2}left(1+frac{hat{mu}_4}{hat{sigma}^4}right),\$\$

where \$hat{sigma}^4 = (hat{sigma}^2)^2\$ and \$hat{mu}_4=n^{-1}sum_{i=1}^n (x_i-hat{mu})^4\$. It then follows that the \$text{TIC}\$ in this example is:

\$\$begin{aligned}text{TIC} &= -2sum_{i=1}^n log f(x_i|hat{mu}, hat{sigma}^2) + 2left(frac12+frac{hat{mu}_4}{2hat{sigma}^4}right) \ &= -frac{n}{2}log(2pihat{sigma}^2)-frac{n}{2} + 2left(frac12+frac{hat{mu}_4}{2hat{sigma}^4}right).end{aligned}\$\$

Note also, that if there exists \$boldsymboltheta_0\$ such that \$f(x|theta_0)=g(x)\$, then \$g(x)\$ is a normal distribution and we have \$mu_3 = 0, mu_4=3sigma^4\$. We then have in \$(4)\$ that:

\$\$b=frac{1}{2}left(1+frac{mu_4}{sigma^4}right)=frac{1}{2}+frac{3sigma^4}{2sigma^4}=2,\$\$

in which case, the \$text{TIC}\$ becomes \$text{AIC}\$ (Akaike information criterion) :

\$\$begin{aligned}text{AIC} &= -2sum_{i=1}^n log f(x_i|hat{mu}, hat{sigma}^2) + 2times 2 \ &= -frac{n}{2}log(2pihat{sigma}^2)-frac{n}{2} + 4.end{aligned}\$\$

Rate this post