Last night I started a complex calculation with gamm() and it took me…
user system elapsed 9259.76 326.05 9622.64 (s)
…meaning it took me 160 minutes or 2.67 hours for that calculation. The problem is that I have to do around 50 or even 100 more of these! So I was wondering if there is any way that could speed up these calculations. I compared the 32bit with the 64bit version (4gb) and R 2.12.2 to calculate a less complex gamm().
User System elapsed 41.87 0.01 42.01
64 bit solution
User System elapsed 40.06 2.82 43.05
but it took even longer using 64bit!
My question now:
Would it help to simply buy more ram, for example 8GB DDR3? or would that be a waste of money? Or would the compiler package in R 2.13.0 be able to handle that properly? I do not think that rcpp can handle gamm() functions, or am I wrong?
any comments welcome!
the gamm() model call for the 160min process was:
g1 <- gamm(CountPP10M ~ s(tempsurf,bs="cr") + s(salsurf,bs="cr") + s(speedsurf,bs="cr") + s(Usurf,bs="cr") + s(Vsurf,bs="cr") + s(Wsurf,bs="cr") + s(water_depth,bs="cr") + s(distance.to.bridge,bs="cr") + s(dist_land2,bs="cr") + s(Dist_sventa,bs="cr"), data=data, random=list(ID_Station=~1), family=poisson, methods="REML", control=lmc) )
You are not going to be able to achieve substantial speed-up here as most of the computation will be being done inside compiled C code.
If you are fitting correlation structures in
gamm() then you can either simplify the correlation structure you want to fit (i.e. don't use
corARMA(p=1, .....) when
corAR1(....) would suffice. Or nest the correlations within years if you have many observations per year, rather than for the whole time interval.
If you aren't fitting correlation structures,
gam() can fit simple random effects, and if you need more complex random effects, consider the gamm4 which is by the same author as mgcv but which uses the lme4 package (
lmer()) instead of the slower/older nlme package (
You could try simpler bases for the smooth terms;
bs = "cr" rather than the default thin-plate spline bases.
If all else fails, and you are just facing big-data issues, the best you can do is exploit multiple cores (manually split a job into ncores chunks and run them in BATCH mode over night, or by one of the parallel processing packages on R) and run models as the weekend. If you do this, make sure you wrap your
gamm() calls in
try() so that the whole job doesn't stop because you have a convergence problem part way through the run.
- Solved – What are the advantages of an exponential random generator using the method of Ahrens and Dieter (1972) rather than by inverse transform
- Solved – Generating random samples from a custom distribution
- Solved – GAMM with multiple and crossed random effects
- Solved – Huge ΔAIC between GAM and GAMM models
- Solved – Incorporating auto-correlation structure into a negative binomial generalized additive mixed model using mgcv in R