Solved – What exactly does the ‘boxcox’ function in R do

I am familiar with the power transform family and I know how to estimate the MLE for $lambda$ for given samples of a random variable. I have been using the 'boxcox' function in R for a sample of a random variable that depends linearly on a latent variable (i.e. a linear model) and it works very well. The problem is that I don't quite know how it works on such linear models.

The official reference won't give clear clues on how to answer my question. Can you provide some information on the answer to my question?

This is not a complete answer, but the real official reference is arguably "the code". In this case it's a little harder to find (it's a method, and it's hidden behind a namespace), but this command shows the code for the default method.

library('MASS') getAnywhere('boxcox.default') 

The result is the following (see also getAnywhere('boxcox.lm')):

A single object matching ‘boxcox.default’ was found It was found in the following places   registered S3 method for boxcox from namespace MASS   namespace:MASS with value  function (object, lambda = seq(-2, 2, 1/10), plotit = TRUE, interp = (plotit &&      (m < 100)), eps = 1/50, xlab = expression(lambda), ylab = "log-Likelihood",      ...)  {     if (is.null(y <- object$y) || is.null(xqr <- object$qr))          stop(gettextf("%s does not have both 'qr' and 'y' components",              sQuote(deparse(substitute(object)))), domain = NA)     if (any(y <= 0))          stop("response variable must be positive")     n <- length(y)     y <- y/exp(mean(log(y)))     logy <- log(y)     xl <- loglik <- as.vector(lambda)     m <- length(xl)     for (i in 1L:m) {         if (abs(la <- xl[i]) > eps)              yt <- (y^la - 1)/la         else yt <- logy * (1 + (la * logy)/2 * (1 + (la * logy)/3 *              (1 + (la * logy)/4)))         loglik[i] <- -n/2 * log(sum(qr.resid(xqr, yt)^2))     }     if (interp) {         sp <- spline(xl, loglik, n = 100)         xl <- sp$x             loglik <- sp$y         m <- length(xl)     }     if (plotit) {         mx <- (1L:m)[loglik == max(loglik)][1L]         Lmax <- loglik[mx]         lim <- Lmax - qchisq(19/20, 1)/2         dev.hold()         on.exit(dev.flush())         plot(xl, loglik, xlab = xlab, ylab = ylab, type = "l",              ylim = range(loglik, lim))         plims <- par("usr")         abline(h = lim, lty = 3)         y0 <- plims[3L]         scal <- (1/10 * (plims[4L] - y0))/par("pin")[2L]         scx <- (1/10 * (plims[2L] - plims[1L]))/par("pin")[1L]         text(xl[1L] + scx, lim + scal, " 95%", xpd = TRUE)         la <- xl[mx]         if (mx > 1 && mx < m)              segments(la, y0, la, Lmax, lty = 3)         ind <- range((1L:m)[loglik > lim])         if (loglik[1L] < lim) {             i <- ind[1L]             x <- xl[i - 1] + ((lim - loglik[i - 1]) * (xl[i] -                  xl[i - 1]))/(loglik[i] - loglik[i - 1])             segments(x, y0, x, lim, lty = 3)         }         if (loglik[m] < lim) {             i <- ind[2L] + 1             x <- xl[i - 1] + ((lim - loglik[i - 1]) * (xl[i] -                  xl[i - 1]))/(loglik[i] - loglik[i - 1])             segments(x, y0, x, lim, lty = 3)         }     }     list(x = xl, y = loglik) } <bytecode: 0x7ff6e0973120> <environment: namespace:MASS> 

Similar Posts:

Rate this post

Leave a Comment