# Solved – Speeding up R Loops in an EM algorithm implementation

I am writing an implementation of the EM algorithm to estimate parameters in a GLMM in R. My model is \$\$y_{ij}|alpha_i sim Bin(1,p_{ij})textrm{, where }logit(p_{ij}) = mu + alpha_i textrm{ and } alpha_i ,,iid sim N(0,sigma^2).\$\$ In other words, my data has a binary response that can be modeled with a fixed intercept, \$mu\$, and a random effect, \$alpha_i\$.

I am interested in performing EM for the sake of better robustness than I'd get through an ML or REML estimator like the ones used in lmer. My procedure for estimating \$mu\$ and \$sigma^2\$ goes as follows:

1. Choose starting values for \$mu\$ and \$sigma^2\$ based on REML estimates.
2. Using Metropolis-Hastings, randomly generate L \$alpha_i\$ vectors.
3. Compute the expectation for the log-likelihood for \$mu\$ and \$sigma^2\$ using the L simulated values of \$alpha_i\$ in a pretty standard Monte Carlo scheme.
4. Maximize the expectation in (3) wrt \$mu\$ and \$sigma^2\$.
5. Use these maximize estimates as new starting values, and return to (1).
6. Repeat k times. This produces two k-length vectors, \$bf{mu}\$ and \$bf{sigma^2}\$.

My concern is the following: speed. I have largely vectorized the procedure, but my program chokes takes awhile for values of L larger than 5,000 and k\$geq\$100. I have determined that the slowest step is unfortunately Step 6. This could mean that any of the steps 1-5 are slow. I suspect that because of the slowdown with high L, the slowdown occurs during the M-step. My likelihood is separable wrt \$mu\$ and \$sigma^2\$ (no cross-products), so instead of one multivariate optimization, I am able to get by with two single-variable optimizations, which has sped up the computation.

Things I would like:

• Is there a way to compile R code? I call an intensive series of functions thousands of times. Variables are pre-allocated before assignments and functions are all vectorized, so I can't imagine many more gains in speed in those areas. Compiling the code would drastically improve my run time. My code is already written, so I'm very averse to converting it to C or C++.
• I have been using the `optimize()` function on the relevant parts of the log-likelihood to get the appropriate maximizations of \$mu\$ and \$sigma^2\$ at the given iteration. In MCEM problems in the mixed models framework, is there an algorithm or function that leverages the structure better than a general optimization routine that could give me better run times?
Contents