I'm new to mixed modelling and i'm confused as to whether its appropriate to use a random effect in an analysis I'm doing. Any advice would be appreciated.

my study is testing how well a newly developed index of mammal abundance can predict the value of an established but more labour intensive index. i've been measuring these indices in multiple forest patches, with multiple plots in each forest patch.

because i'm not directly interested in the effect of forest patches, and because my sample plots are nested within forest patches, ive been using forest patch as a random effect. However, I've got a couple of questions about this:

first, i know that random effects allow you to generalise your results across all possible levels of the random factor, not just the ones you sampled. but it seems to me that to make this kind of inference your levels would have to be randomly sampled? My forest patches were not randomly sampled, so can I still use them as a random effect?

second, Ive read that you can test whether it is necessary to have a random effect by doing eg a likelihood ratio test to compare models with and without the effect. I've done this, and it suggests that the random effect model does not explain the data as well as a fixed effects only model. my issue with this is that my plots are still nested within forest patches, and so presumably not independent. so, can i use this LRT approach to justify excluding the random effect, or do i still need to include it to account for nestedness? and if i do end up removing the random effect, is there a way to verify that plots within forest patches can be considered independent?

Thanks for your help!

Jay

**Contents**hide

#### Best Answer

As I understand, you have a simple nested observational design (plots within patches) and your interest is in a correlation/regression between two continuous variables (the two indices). Your sample size is m patches x n plots = N pairs of observations (or the appropriate sumatory if unbalanced). No proper randomization was involved, but maybe you can/should/want to consider that (1) the patches were "randomly" selected from all the patches of this kind or in some area, and then (2) the plots were "randomly" selected within each patch.

If you ignore the random factor Patch, you may be pseudoreplicating by considering that you have randomly selected N plots "freely", without constraining them to be (in number or type) in those (previously) selected patches.

So, your first question: yes, that is what a random factor allows. The validity of such inference depends on the validity of the assumption that haphazard selection is equivalent to random selection of patches (e.g., that your results would not be different if a different set of forest patches was selected). That puts a limit also on your space of inference: the kind of forest or geographical area up to which your results extend depends on the maximal (imaginary) population of patches from where your sample is a credible "random" sample. Maybe your observations are a "reasonable random" sample of the mammals of the forest patches in your region but would be a suspiciously aggregated sample of the mammals of the whole continent.

The second one: the test will depend on "the degree of pseudoreplication", or the evidence in your sample that plots "belong" to patches. This is, how much variation there is among patches and among plots within patches (search for intraclass correlation). In an extreme, only variation among patches is present (plots within a patch are all the same) and you have "pure pseudoreplication": your N should be the number of patches, and sampling one or many plots from each of them does not provide new information. On the other extreme, all variation happens between plots, and there is no extra variation explained by knowing to which forest patch each plot belongs (and then the model without the random factor would appear more parsimonious); you have "independent" plots. NONE of the extremes are very likely to happen… particularly for biological variables observed on the ground, if only because of spatial autocorrelation and geographical distributions of the mammals. I personally prefer to keep factors by design anyway (e.g., even when patches is not a relevant source of variation IN THIS SAMPLE) to sustain the "experimental-observational" analogy explained above; remember: not having evidence in your sample to reject the null hipothesis that variation among patches is zero does not mean that variation is zero in the population.

### Similar Posts:

- Solved – Whitening and unwhitening for sparse coding
- Solved – Data augmentation step in Krizhevsky et al. paper
- Solved – principal components analysis is creating correlated axes with nested data
- Solved – PCA to initialize Convolutional Neural Network
- Solved – Valid method to analyze spatial correlations in images