In the Box-Cox transformation parameter $lambda$ is defined by likelihood function. But I cannot understand what exactly is maximized in this case? What is the purpose of maximum-likelihood in this case?
Best Answer
This family of transformations combines power and log transformations, and is parametrised by $lambda$. Note that this is continuous in $lambda$. The aim is to use likelihood methods to find the “best” $lambda$.
Maybe it is best to provide an example, so let's assume that, for some $lambda$ we have $E(Y ^{(λ)} ) = Xbeta$ together with the normality assumption. Then, given data $Y_1, . . . , Y_n$ (ie the untransformed data), the likelihood is
$$ (2pi sigma^2)^{-n/2}expleft(-frac1{2sigma^2}(Y^{(lambda)}-Xbeta)^T(Y^{(lambda)}-Xbeta)right)prod_{i=1}^nY_i^{lambda -1}$$
where the product at the end is the relevant Jacobian which will clearly differ in size for different values of $lambda$, and so we want the optimal one for it to be consistent with our data. For each $lambda$, fitting the linear model gives $hat{beta}{(lambda)} = (X^TX)^{-1}X^TY^{(lambda)} , RSS(λ) = (Y^{(lambda)})^T(I_X)Y^{(λ)}$ , and $hat{sigma}^2 (λ) = RSS(lambda)/n$ (the maximum likelihood estimate).
The profile log-likelihood for $lambda$, obtained by maximising the loglikelihood over $beta$ and $sigma^2$, is therefore
$$ L_{max}(lambda)= c – frac{n}{2}log(RSS(lambda)/n)+ (lambda-1)sum_{i=1}^n log(Y_i)$$
And so… we treat this as we usually treat log-likelihood functions: values of $lambda$ close to the maximising value $hat{lambda}$ of $lambda$ are consistent with the data.
Similar Posts:
- Solved – How to get the Box-Cox log likelihood using the Jacobian
- Solved – How to get the Box-Cox log likelihood using the Jacobian
- Solved – Invariance property of maximum likelihood estimator
- Solved – Invariance property of maximum likelihood estimator
- Solved – SciPy’s stats boxcox transformation unexpected behavior: negative exponent (lambda)