# Solved – Is it appropriate to use a multilevel model with little data

Lets say I have `v1` and `v2` and I have 400 data points for each variable. I expect them to be negatively correlated, so a simple approach here would be to calculate the correlation

However, the 400 pairs of data points came from 40 different people, where each person only offered 10 pairs of datapoints

One way I could approach this would be to predict `v1` from `v2` and additionally treat `person` as a random effects variable, so something like `v1 ~ v2 + (1|person)`

I'm concerned about only have 10 datapoints per person. On the one hand a straight correlation doesnt consider the fact that the data is likely clustered due to multiple rows coming from the same individual. On the other hand, I only have 10 datapoints for an individual which may skew the correlation value for any given individual…

Is that a sufficient amount of data per level to run a multilevel model like this? Or does the N per level not matter, but rather the total N (400) being the only important factor?

Contents

Yes, it is reasonable to fit a multilevel model with this amount of data. Further, the single-level correlation you describe can be considered a special case of a multilevel model (one that assumes exactly zero person to person variance in the v1 intercept and the slope relating v1 to v2).

Before you fit this model, it may be worth asking: Do you expect the relationship between v1 and v2 to be driven by between-person variance, or by within-person variance? That is, does your theory suggest that people high in v1 will be low on v2, or that within a given person you will see instances such that where v1 is high, v2 will be low?

Perhaps your theory doesn't distinguish between these possibilities, but in principle there is no reason to suspect that the within-person relationship will be the same as the between-person relationship (i.e., see Simpson's Paradox and ecological fallacy).

To estimate both effects within the same model, you would compute means of v2 by person (v2pm), as well as instance-to-instance deflections from these means, or person-centered scores (v2pc).

data <- within(data, {v2pm = ave(v2, person, FUN=function(x) mean(x, na.rm=T))})

data\$v2pc <- data\$v2 – data\$v2pm

You'd then fit a model such as:

lmer(v1 ~ 1 + v2pm + v2pc + (1 + v2pc|person), data)

If you primarily care about the effect of person-to-person differences in mean v2, you have 40 observations of these means. If you instead care about the effect of instance-to-instance differences in v2, you have 40 observations of such an effect (each based on 10 observations within a person).

If you fit the model without decomposing the v2 person means from the person centered scores, the model will come to some weighted average of the between and within person effects.

For more information on these kinds of models, see Bolger & Laurenceau 2013 and Gelman & Hill 2007.

Rate this post