# Solved – What to do when your likelihood function has a double product with small values near zero – log transform doesn’t work

I currently have a likelihood function defined as the following:

\$\$
L=prod_{i=1}^{N}left[prod_{s=1}^{S_i}L_{is}(yspace|space rho_A)timesphi + prod_{s=1}^{S_i}L_{is}(yspace|space rho_B)times(1-phi)right]
\$\$

Here, \$y\$ is an observation, \$rho_A\$ and \$rho_B\$ are defined to be a vector set of parameters, and \$phi\$ is just a variable.

What makes this likelihood different than the usual likelihoods I have seen is that not only are there are \$N\$ observations, but each observation has sub-values up to \$S_i\$, for \$i=1,…,N\$.

My problem right now is that the \$L_{is}(yspace|space rho_A)\$ are small to begin with, with some values around \$0.000000001\$ or even \$0\$. Therefore, if I multiply many of these small values, I get an even smaller result that `R` or most other languages will just treat as zero. However, taking the logarithm of \$L\$ only converts the first product into a sum, which doesn't help at all.

I've also tried to take the exponential, and then the log of the product, but I still have values that go straight to zero. Would anyone know of a prudent transformation to use? I have tried to use the log-transform method where I find the maximum, then re-scale, as in here: Converting (normalizing) very small likelihood values to probability. However, it appears that this would work for likelihood's with denominator's of zero. Thanks!

Contents

I recently had to deal with the same issue when computing conditional probabilities involving numbers on the order of \$10^{-10000}\$ (because normalizing the probability distribution would have required a great deal of unnecessary calculations). The heart of this difficulty, which comes up repeatedly in statistical computation, concerns taking logarithms of sums which themselves can overflow or underflow the computing platform's form of numeric representation (typically IEEE doubles). In the present case (to abstract away all irrelevant detail) we may write

\$\$L = prod_i left(sum_j x_{ij}right)\$\$

where the products \$x_{i1} = prod_{s=1}^{S_i}L_{is}(yspace|space rho_A)phi\$ and \$x_{i2} = prod_{s=1}^{S_i}L_{is}(yspace|space rho_B)(1-phi)\$ are themselves best computed using logarithms. Naturally we would like to compute \$L\$ as

\$\$L = exp(log(L));quad log(L) = sum_i log left(sum_j x_{ij}right)\$\$

and in most applications, having \$log(L)\$ is all that is really needed. There is the rub: how to compute logarithms of sums? Dropping the (now superfluous) subscript \$i\$, and assuming the \$x_j\$ will be computed using logarithms, we are asking how to calculate expressions like

\$\$logsum_j exp(log(x_j))\$\$

without over- or underflow.

(Note, too, that in many statistical applications it suffices to obtain a natural logarithm to an accuracy of only one or two decimal places, so high precision is not usually needed.)

The solution is to keep track of the magnitudes of the arguments. When \$x_j gg x_k\$, then (to within a relative error of \$x_k/x_j\$) it is the case that \$x_j + x_k = x_j\$.

Often this magnitude-tracking can be done by estimating the order of magnitude of the sum (in some preliminary approximate calculation, for instance) and simply ignoring any term which is much smaller in magnitude. For more generality and flexibility, we can examine the relative orders of magnitude as we go along. This works best when summands have the most similar orders of magnitude, so that as little precision as possible is lost in each partial sum. Provided, then, that we arrange to sum the terms from smallest to largest, we may repeatedly apply this useful approximation and, in the end, lose no meaningful precision.

In many statistical applications (such as computing log likelihoods, which frequently involve Gamma and Beta functions) it is handy to have built-in functions to return the logarithms of factorials \$n!\$ and binomial coefficients \$binom{n}{k}\$. Most statistical and numerical computing platforms have such functions–even Excel does (`GAMMALN`). Use these wherever possible.

Here is `R` code to illustrate how this could be implemented. So that it will serve as (executable) pseudocode, only simple Fortran-like expressions are used: they will port easily to almost any computing environment.

``log.sum <- function(y, precision=log(10^17)) {   #    # Returns the logarithm of sum(exp(y)) without overflow or underflow.   # Achieves a relative error approximately e^(-precision).   #   # Some precision may be lost in computing `log(1+exp(*))`, depending on   # the computing platform.   #   if (length(y) == 0) return(1)   log.plus <- function(a, b) ifelse(abs(b-a) > precision,                                      max(b,a), max(b,a) + log(1 + exp(-abs(b-a))))   y <- sort(y)   x <- y[1]   for (z in y[-1]) x <- log.plus(x, z)   return (x)   } ``

For example, let's compute \$logsum_{i=0}^{1000}exp(i) = 1000.45867514538708ldots\$ in `R` with and without this method:

x <- seq(0, 1000, by=1); print(log(sum(exp(x))), digits=18)

``[1] Inf ``

Summation simply overflows. However,

print(log.sum(x), digits=18)

``[1] 1000.45867514538713 ``

is accurate to 17 full decimal places, which is all one can expect of IEEE doubles.

Rate this post