Solved – find examples of Takeuchi Information Criterion (TIC) at work

I have been looking for examples of the TIC and couldn't find any. In particular I would like to know how exactly do you estimate the penalty term in TIC. That term consists of, as I found it somewhere, score function and Fisher information.
Are there any online resources where I can find that?

Here is an example found from the book Information Criteria and Statistical Modeling (pages 61-64) with my own minor changes and corrections to the presentation:

$text{TIC}$ for normal model

Let data samples $x_1, x_2, …, x_n$ be generated from a true distribution $g(x)$. We aim to estimate $g(x)$ by using a normal model:

$$f(x|mu,sigma^2) = frac{1}{sqrt{2pisigma^2}}expleft(-frac{(x-mu)^2}{2sigma^2}right).hspace{3cm}(1)$$

We estimate the parameters $mu, sigma^2$ with maximum likelihood (which is required by $text{TIC}$ to my understanding), so our estimator model is:

$$f(x|hat{mu},hat{sigma}^2) = frac{1}{sqrt{2pihat{sigma}^2}}expleft(-frac{(x-hat{mu})^2}{2hat{sigma}^2}right),$$

where $hat{mu}=n^{-1}sum_{i=1}^n x_i$ and $hat{sigma}^2=n^{-1}sum_{i=1}^n (x_i-hat{mu})^2$.

The $text{TIC}$ for the given model $f(x|boldsymboltheta)$ (where $boldsymboltheta =(mu, sigma^2)$ now) is defined as:

$$text{TIC}=-2sum_{i=1}^n log f(x_i|boldsymboltheta) + 2text{tr}(I(boldsymboltheta)J(boldsymboltheta)^{-1}),$$

where $b=text{tr}(I(boldsymboltheta)J(boldsymboltheta)^{-1})$ is the bias term and the $dtimes d$ ($d$ is dimensionality of the parameter vector, here we have $d=2$) matrices $I(boldsymboltheta)$ and $J(boldsymboltheta)$ are defined as:

$$I(boldsymboltheta)=E_g left[frac{partiallog f(X|boldsymboltheta)}{partial boldsymboltheta}frac{partiallog f(X|boldsymboltheta)}{partial boldsymboltheta^T}biggrrvert_{boldsymboltheta=boldsymboltheta}right],hspace{2cm}(2)$$

$$J(boldsymboltheta)=-E_g left[frac{partial^2 log f(X|boldsymboltheta)}{partialboldsymboltheta partialboldsymboltheta^T}biggrrvert_{boldsymboltheta=boldsymboltheta}right].hspace{4cm}(3)$$

In the above, tr$(cdot)$ stands for matrix trace and $E_g$ is expectation with respect to distribution $g(x)$. Lets now proceed to calculate the matrices $I(boldsymboltheta)$ and $J(boldsymboltheta)$ given our model $f(x|mu,sigma^2)$ in $(1)$. First, we need to calculate the log-likelihood function:

$$log f(x|boldsymboltheta)=-frac{1}{2}logleft(2pisigma^2right)-frac{(x-mu)^2}{2sigma^2}.$$

The partial derivatives of the log-likelihood function are:

$$frac{partial log f(x|boldsymboltheta)}{partialmu}=frac{x-mu}{sigma^2},hspace{1cm}frac{partial log f(x|boldsymboltheta)}{partialsigma^2}=-frac{1}{2sigma^2}+frac{(x-mu)^2}{2sigma^4}$$

$$frac{partial^2 log f(x|boldsymboltheta)}{partialmu^2}=-frac{1}{sigma^2},hspace{1cm}frac{partial^2 log f(x|boldsymboltheta)}{partial(sigma^2)^2}=frac{1}{2sigma^4}-frac{(x-mu)^2}{sigma^6}$$

$$frac{partial^2 log f(x|boldsymboltheta)}{partialmupartial sigma^2}=frac{partial^2 log f(x|boldsymboltheta)}{partialsigma^2partialmu}=-frac{x-mu}{sigma^4}.$$

We therefore have the corresponding $2times 2$ matrices:

$$begin{aligned}I(boldsymboltheta)&=E_gleft[begin{pmatrix} frac{partial log f(X|boldsymboltheta)}{partialmu} \ frac{partial log f(X|boldsymboltheta)}{partialsigma^2} end{pmatrix}begin{pmatrix} frac{partial log f(X|boldsymboltheta)}{partialmu} & frac{partial log f(X|boldsymboltheta)}{partialsigma^2} end{pmatrix}right]\ &=E_gleft[begin{pmatrix} frac{X-mu}{sigma^2} \ -frac{1}{2sigma^2}+frac{(X-mu)^2}{2sigma^4} end{pmatrix}begin{pmatrix} frac{X-mu}{sigma^2} & -frac{1}{2sigma^2}+frac{(X-mu)^2}{2sigma^4} end{pmatrix}right] \ &=E_gleft[begin{matrix} frac{(X-mu)^2}{sigma^4} & -frac{x-mu}{2sigma^4}+frac{(X-mu)^3}{2sigma^6} \ -frac{X-mu}{2sigma^4}+frac{(X-mu)^3}{2sigma^6} & frac{1}{4sigma^4}-frac{(X-mu)^2}{2sigma^6}+frac{(X-mu)^4}{4sigma^8} end{matrix}right] \ &= begin{bmatrix} frac{1}{sigma^2} & frac{mu_3}{2sigma^6} \ frac{mu_3}{2sigma^6} & frac{mu_4}{4sigma^8}-frac{1}{4sigma^4} end{bmatrix}, end{aligned}$$

$$begin{aligned}J(boldsymboltheta) &= -E_gbegin{bmatrix}frac{partial^2 log f(X|boldsymboltheta)}{partialmu^2} & frac{partial^2 log f(X|boldsymboltheta)}{partialmupartialsigma^2} \ frac{partial^2 log f(X|boldsymboltheta)}{partialsigma^2partialmu} & frac{partial^2 log f(X|boldsymboltheta)}{partial(sigma^2)^2}end{bmatrix} \ &= E_gbegin{bmatrix}frac{1}{sigma^2} & frac{X-mu}{sigma^4} \ frac{X-mu}{sigma^4} & frac{(X-mu)^2}{sigma^6}-frac{1}{2sigma^4}end{bmatrix}\ &= begin{bmatrix}frac{1}{sigma^2} & 0 \ 0 & frac{1}{2sigma^4}end{bmatrix},end{aligned}$$

where $mu_j=E_gleft[(X-mu)^jright]$ is the $j$th-order centralized moment. We then have:

$$I(boldsymboltheta)J(boldsymboltheta)^{-1}=begin{bmatrix} frac{1}{sigma^2} & frac{mu_3}{2sigma^6} \ frac{mu_3}{2sigma^6} & frac{mu_4}{4sigma^8}-frac{1}{4sigma^4} end{bmatrix}begin{bmatrix}sigma^2 & 0 \ 0 & 2sigma^4end{bmatrix}=begin{bmatrix}1 & frac{mu_ 3}{sigma^2}\ frac{mu_3}{2sigma^4} & frac{mu_4}{2sigma^4}-frac{1}{2}end{bmatrix},$$

and therefore the bias term is:

$$b=text{tr}left(I(boldsymboltheta)J(boldsymboltheta)^{-1}right) = 1 + frac{mu_4}{2sigma^4}-frac{1}{2}=frac{1}{2}left(1+frac{mu_4}{sigma^4}right).hspace{2cm}(4)$$

Thus, by now plugging in the maximum likelihood estimator $hat{boldsymboltheta}=(hat{mu}, hat{sigma}^2)$, we get the estimator for the bias term:

$$hat{b}=text{tr}left(I(hat{boldsymboltheta})J(hat{boldsymboltheta})^{-1}right)=frac{1}{2}left(1+frac{hat{mu}_4}{hat{sigma}^4}right),$$

where $hat{sigma}^4 = (hat{sigma}^2)^2$ and $hat{mu}_4=n^{-1}sum_{i=1}^n (x_i-hat{mu})^4$. It then follows that the $text{TIC}$ in this example is:

$$begin{aligned}text{TIC} &= -2sum_{i=1}^n log f(x_i|hat{mu}, hat{sigma}^2) + 2left(frac12+frac{hat{mu}_4}{2hat{sigma}^4}right) \ &= -frac{n}{2}log(2pihat{sigma}^2)-frac{n}{2} + 2left(frac12+frac{hat{mu}_4}{2hat{sigma}^4}right).end{aligned}$$

Note also, that if there exists $boldsymboltheta_0$ such that $f(x|theta_0)=g(x)$, then $g(x)$ is a normal distribution and we have $mu_3 = 0, mu_4=3sigma^4$. We then have in $(4)$ that:

$$b=frac{1}{2}left(1+frac{mu_4}{sigma^4}right)=frac{1}{2}+frac{3sigma^4}{2sigma^4}=2,$$

in which case, the $text{TIC}$ becomes $text{AIC}$ (Akaike information criterion) :

$$begin{aligned}text{AIC} &= -2sum_{i=1}^n log f(x_i|hat{mu}, hat{sigma}^2) + 2times 2 \ &= -frac{n}{2}log(2pihat{sigma}^2)-frac{n}{2} + 4.end{aligned}$$

Similar Posts:

Rate this post

Leave a Comment