# Solved – Parameter \$lambda\$ of Box-Cox transformation and likelihood

In the Box-Cox transformation parameter \$lambda\$ is defined by likelihood function. But I cannot understand what exactly is maximized in this case? What is the purpose of maximum-likelihood in this case?

Contents

This family of transformations combines power and log transformations, and is parametrised by \$lambda\$. Note that this is continuous in \$lambda\$. The aim is to use likelihood methods to find the “best” \$lambda\$.

Maybe it is best to provide an example, so let's assume that, for some \$lambda\$ we have \$E(Y ^{(λ)} ) = Xbeta\$ together with the normality assumption. Then, given data \$Y_1, . . . , Y_n\$ (ie the untransformed data), the likelihood is

\$\$ (2pi sigma^2)^{-n/2}expleft(-frac1{2sigma^2}(Y^{(lambda)}-Xbeta)^T(Y^{(lambda)}-Xbeta)right)prod_{i=1}^nY_i^{lambda -1}\$\$

where the product at the end is the relevant Jacobian which will clearly differ in size for different values of \$lambda\$, and so we want the optimal one for it to be consistent with our data. For each \$lambda\$, fitting the linear model gives \$hat{beta}{(lambda)} = (X^TX)^{-1}X^TY^{(lambda)} , RSS(λ) = (Y^{(lambda)})^T(I_X)Y^{(λ)}\$ , and \$hat{sigma}^2 (λ) = RSS(lambda)/n\$ (the maximum likelihood estimate).

The profile log-likelihood for \$lambda\$, obtained by maximising the loglikelihood over \$beta\$ and \$sigma^2\$, is therefore

\$\$ L_{max}(lambda)= c – frac{n}{2}log(RSS(lambda)/n)+ (lambda-1)sum_{i=1}^n log(Y_i)\$\$

And so… we treat this as we usually treat log-likelihood functions: values of \$lambda\$ close to the maximising value \$hat{lambda}\$ of \$lambda\$ are consistent with the data.

Rate this post