I am developing GLMM's in order to assess habitat selection (using GLMMs' coeficients to construct Resource selection functions).
I have (telemetry) data from 5 study areas, and each area has a different number of individuals monitored.
To develop GLMM's, the dependend variable is binary (1-used locations; 0-available locations), and I have a initial set of 14 continuous variables (8 land cover variables; 2 distance variables, to artificial areas and water sources; 4 topographic variables): a buffer was placed around each location and the area of each land cover within that buffer was accounted for; distances were measured from each point to the nearest feature, and topographic variables were obtained using DEM rasters. I tested for correlation using Spearman's Rank, so not all 14 were used in the GLMMs. All variables were transformed using z-score.
As random effect, I used individual ID (In another question ("GLMM: relationship between AIC, R squared and overdispersion?"), it became clear that using study areas as random effect was not useful nor correct).
I constructed a GLMM with 9 variables (not correlated) and a random effect, then used "dredge()" function and "model.avg(dredge)" to sort models by AIC values.
This was the result (only models of AICc lower than 2 represented):
[1]Call: model.avg(object = dredge.m1.1) Component model call: glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z, family = binomial(link = "logit")) Component models: df logLik AICc delta weight 123578 8 -4309.94 8635.89 0.00 0.14 1235789 9 -4309.22 8636.44 0.55 0.10 123789 8 -4310.52 8637.04 1.14 0.08 1235678 9 -4309.75 8637.50 1.61 0.06 12378 7 -4311.78 8637.57 1.67 0.06 1234578 9 -4309.79 8637.58 1.69 0.06
Variables 1 and 2 represent the distance variables; from 3 to 8 land cover variables, and 9 is a topographic variable.
Weights seem to be very low, even if I average all those models as it seems to be common when delta values are low. Even with this weights, I constructed GLMMs for each of the combinations, and the results were simmilar for all 6 combinations. Here are the results for the first one (GLMM + overdispersion + r-squared):
Random effects: Groups Name Variance Std.Dev. ID.CODE_1 (Intercept) 13.02 3.608 Number of obs: 32670, groups: ID.CODE_1, 55 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.54891 0.51174 -1.073 0.283433 3 -0.22232 0.04059 -5.478 4.31e-08 *** 5 -0.05433 0.02837 -1.915 0.055460 . 7 -0.13108 0.02825 -4.640 3.49e-06 *** 8 -0.15864 0.08670 -1.830 0.067287 . 1 0.28438 0.02853 9.968 < 2e-16 *** 2 0.11531 0.03021 3.817 0.000135 *** Residual deviance: 0.256 r.squaredGLMM(): R2m R2c 0.01063077 0.80039950
This is what I get from this analysis:
1) Variance and SD of the random effect seems fine (definitely better than the "0" I got when using Study Areas as random effect);
2) Estimate values make sense from what I know of the species and the knowledge I have of the study areas;
3) Overdispersion values seem good, and R-squared values don't seem very good (at least when considering only fixed effects) but, as I read in several places, AIC and r-squared are not always in agreement.
4) Weight values seem very low. Does it mean the models are not good?
Then what I did was construct a GLM ("glm()"), so no random effect was used. I used the same set of variables used in [1], and here are the results (only models of AICc lower than 2 represented):
[2] Call: model.avg(object = dredge.glm_m1.1) Component model call: glm(formula = Used ~ <512 unique rhs>, family = binomial(link = "logit"), data = All_SA_Used_RP_Area_z) Component models: df logLik AICc delta weight 12345678 9 -9251.85 18521.70 0.00 0.52 123456789 10 -9251.77 18523.54 1.84 0.21 1345678 8 -9253.84 18523.69 1.99 0.19
In this case, weight values are higher.
Does this mean that it is better not to use a random effect? (I am not sure I can compare GLMM with GLM results, correct me if I am doing wrong assumptions)
Best Answer
I would strongly advise you to avoid automated model selection procedures such as dredge()
(even the function name makes me shiver). There may be some merit in these when you are primarily concerned about prediction for future data, but even in this case it is strongly recommended to use some form of cross-validation, where you can build your model on a training dataset and then assess it's predictive capability on another dataset. If you build a model based on AIC with your whole dataset, while it may predict your current dataset well, there is a good chance it will perform poorly on new data. When your goal is mainly inference, the best way forward is to use theory and common sense to build your model. Unless you have a huge number of variables, then I think that theory and common sense is also a better method for prediction too.
A good starting point is to draw a path diagram to hypothesize the associations, and directions of causality, according to theory. This can allow you to build a model avoiding common problems such as over-adjustment for confounding, and including variables that should not be present in the model (for example, if they lie on the causal path between an exposure and the outcome). Although the current theory may not be not be well-developed (developing the theory is presumably one of your goals) a path diagram may help you rule out some possible models. DAGitty is a very user-friendly web-based graphical tool that can assist with this.
You have measured 14 variables, presumably choosing these for some good reasons. Excluding some solely on the basis of high bivariate correlation without understanding the relations between them is dangerous. If they are essentially measuring the same thing, or one is derived implicitly (or explicitly) from another, this may be valid, but if one is a cause of the other, then you need to think carefully about which one to exclude. A path diagram will be very useful for this.
You have repeated measures on individual subjects, therefore a priori it is a good idea to account for this by using random intercepts for subject because measurements on the same individual are likely to be more similar to those on another individual. I would strongly urge you not to rely on p-values in general, but even more so in this case. If you really want to test the significance of the random intercept then a bootstrap estimate is perhaps the most robust way to proceed.
As for your question about low weights, I don't know what these weights represent exactly, but I assume that they must all add up to 1 so if there are a lot of models to choose from then obviously there will be many models with small weights, and if the "best" ones are very similar to each other then by necessity, the "best" ones will have small weights. Note that the difference in AICc between the "best" and the worst (well, the 6th best, since there are only 6 shown) is just 1.69, which is telling you that there is very little difference between any of these models.