Solved – How to make R’s gamm work faster

Last night I started a complex calculation with gamm() and it took me…

     user        system       elapsed      9259.76      326.05     9622.64 (s) 

…meaning it took me 160 minutes or 2.67 hours for that calculation. The problem is that I have to do around 50 or even 100 more of these! So I was wondering if there is any way that could speed up these calculations. I compared the 32bit with the 64bit version (4gb) and R 2.12.2 to calculate a less complex gamm().

32bit solution

 User      System        elapsed   41.87        0.01       42.01 

64 bit solution

  User      System      elapsed  40.06        2.82       43.05 

but it took even longer using 64bit!

My question now:

Would it help to simply buy more ram, for example 8GB DDR3? or would that be a waste of money? Or would the compiler package in R 2.13.0 be able to handle that properly? I do not think that rcpp can handle gamm() functions, or am I wrong?

any comments welcome!

the gamm() model call for the 160min process was:

  g1 <- gamm(CountPP10M    ~ s(tempsurf,bs="cr")                             + s(salsurf,bs="cr")                             + s(speedsurf,bs="cr")                             + s(Usurf,bs="cr")                            + s(Vsurf,bs="cr")                               + s(Wsurf,bs="cr")                            + s(water_depth,bs="cr")                            + s(distance.to.bridge,bs="cr")                            + s(dist_land2,bs="cr")                            + s(Dist_sventa,bs="cr"),                            data=data,                            random=list(ID_Station=~1),                            family=poisson,                            methods="REML",                            control=lmc)             ) 

You are not going to be able to achieve substantial speed-up here as most of the computation will be being done inside compiled C code.

If you are fitting correlation structures in gamm() then you can either simplify the correlation structure you want to fit (i.e. don't use corARMA(p=1, .....) when corAR1(....) would suffice. Or nest the correlations within years if you have many observations per year, rather than for the whole time interval.

If you aren't fitting correlation structures, gam() can fit simple random effects, and if you need more complex random effects, consider the gamm4 which is by the same author as mgcv but which uses the lme4 package (lmer()) instead of the slower/older nlme package (lme()).

You could try simpler bases for the smooth terms; bs = "cr" rather than the default thin-plate spline bases.

If all else fails, and you are just facing big-data issues, the best you can do is exploit multiple cores (manually split a job into ncores chunks and run them in BATCH mode over night, or by one of the parallel processing packages on R) and run models as the weekend. If you do this, make sure you wrap your gamm() calls in try() so that the whole job doesn't stop because you have a convergence problem part way through the run.

Similar Posts:

Rate this post

Leave a Comment