Solved – Speeding up R Loops in an EM algorithm implementation

I am writing an implementation of the EM algorithm to estimate parameters in a GLMM in R. My model is $$y_{ij}|alpha_i sim Bin(1,p_{ij})textrm{, where }logit(p_{ij}) = mu + alpha_i textrm{ and } alpha_i ,,iid sim N(0,sigma^2).$$ In other words, my data has a binary response that can be modeled with a fixed intercept, $mu$, and a random effect, $alpha_i$.

I am interested in performing EM for the sake of better robustness than I'd get through an ML or REML estimator like the ones used in lmer. My procedure for estimating $mu$ and $sigma^2$ goes as follows:

  1. Choose starting values for $mu$ and $sigma^2$ based on REML estimates.
  2. Using Metropolis-Hastings, randomly generate L $alpha_i$ vectors.
  3. Compute the expectation for the log-likelihood for $mu$ and $sigma^2$ using the L simulated values of $alpha_i$ in a pretty standard Monte Carlo scheme.
  4. Maximize the expectation in (3) wrt $mu$ and $sigma^2$.
  5. Use these maximize estimates as new starting values, and return to (1).
  6. Repeat k times. This produces two k-length vectors, $bf{mu}$ and $bf{sigma^2}$.

My concern is the following: speed. I have largely vectorized the procedure, but my program chokes takes awhile for values of L larger than 5,000 and k$geq$100. I have determined that the slowest step is unfortunately Step 6. This could mean that any of the steps 1-5 are slow. I suspect that because of the slowdown with high L, the slowdown occurs during the M-step. My likelihood is separable wrt $mu$ and $sigma^2$ (no cross-products), so instead of one multivariate optimization, I am able to get by with two single-variable optimizations, which has sped up the computation.

Things I would like:

  • Is there a way to compile R code? I call an intensive series of functions thousands of times. Variables are pre-allocated before assignments and functions are all vectorized, so I can't imagine many more gains in speed in those areas. Compiling the code would drastically improve my run time. My code is already written, so I'm very averse to converting it to C or C++.
  • I have been using the optimize() function on the relevant parts of the log-likelihood to get the appropriate maximizations of $mu$ and $sigma^2$ at the given iteration. In MCEM problems in the mixed models framework, is there an algorithm or function that leverages the structure better than a general optimization routine that could give me better run times?

In case someone else is interested, digging into the issue, a friend of mine found a post from Dirk Eddelbuettel's blog about the R compiler added in 2.13.0. It did not give amazing speed increases because most of my functions were already vectorized and I only had a couple for-loops, but someone else may find it useful. The hope was to avoid writing the code in C or C++, and this compiler does that, though there are better options.

Thinking Inside the Box, Dirk Eddelbuettel, April 12, 2011.

Similar Posts:

Rate this post

Leave a Comment