I am currently working on a problem where I try to explain within subject variance in an outcome using multiple other variables. In R setting up the model looks like this
lmFull<-lmer(outcome ~ (1|subject) + pred1 + pred2 +pred3 ... pred40)
The problem is that there is some mild missingness in each predictor variables. Thus, using listwise deletion (the default setting of lmer) many cases are lost, since due to the number of predictors chances are high that at least one is missing.
I have googled this problem but can only find examples claiming that this is not an issue in mixed modelling since it uses the long format. This, however, is only true if there are few predictors.
One obvious solution would be to use FIML. However, some of my predictors are ordinal and thus not normally distributed. Setting up a valid joint distribution for all my predictors will be very cumbersome.
One quick and dirty solution might be to impute the values with the within subject means.
Any thoughts? Is there a recommended approach?
Best Answer
Imputation using within subject means isn't a great idea because it will result in biased (too small) standard errors and possibly biased estimates.
Assuming that the data are missing at random, a much better idea is to use multiple imputation. The mice
package in R has the capability to impute continuous variables in a mixed efects framework with a single random effect (grouping variable) – just specify 2l.norm
as the grouping variable. For example, suppose our analysis model is
> require(mice) > require(lme4) > m0 <- lmer(teachpop~sex+texp+popular + (1|school), data=popmis) > confint(m0) 2.5 % 97.5 % .sig01 0.44905533 0.62574295 .sigma 0.54368549 0.59259188 (Intercept) 2.03118933 2.67864796 sex -0.07108881 0.09183821 texp 0.03024598 0.06505065 popular 0.22257646 0.32572600
Due to missingness in the predictor popular
this model may be biased. So we will use multiple imputation:
> ini <- mice(popmis, maxit=0) > (pred <- ini$pred) pupil school popular sex texp const teachpop pupil 0 0 0 0 0 0 0 school 0 0 0 0 0 0 0 popular 1 1 0 1 1 0 1 sex 0 0 0 0 0 0 0 texp 0 0 0 0 0 0 0 const 0 0 0 0 0 0 0 teachpop 0 0 0 0 0 0 0
This is the default predictor matrix for the imputation model. Only popular
has missing values, and we are going to impute them using a mixed model where school
is the grouping factor, and the other variables are fixed effects. To do this, we use -2
to tell mice
that school is the grouping variable, and 2
for the fixed effects:
> pred["popular",] <- c(0, -2, 0, 2, 2, 2, 0) > (pred)
So now we have:
pupil school popular sex texp const teachpop pupil 0 0 0 0 0 0 0 school 0 0 0 0 0 0 0 popular 0 -2 0 2 2 2 0 sex 0 0 0 0 0 0 0 texp 0 0 0 0 0 0 0 const 0 0 0 0 0 0 0 teachpop 0 0 0 0 0 0 0
We have set up the predictor matrix, so we can now create 10 multiply imputed datasets using the 2l.norm
method to impute values for popular
> imp <- mice(popmis, meth = c("","","2l.norm","","","",""), pred = pred, maxit=10, m = 10)
Now we run the mixed model on each of the imputed datasets:
> fit <- with(imp, lmer(teachpop~sex+texp+popular + (1|school)))
…and pool the results:
> summary(pool(fit)) est se t df Pr(>|t|) lo 95 hi 95 nmis (Intercept) 2.73951576 0.165053863 16.597708 1991.5874 0.000000e+00 2.41581941 3.06321211 NA sex 0.08620420 0.031042794 2.776947 915.1865 5.599307e-03 0.02528087 0.14712753 0 texp 0.05682495 0.009713717 5.849970 1991.4452 5.733929e-09 0.03777484 0.07587506 0 popular 0.16696926 0.018760706 8.899945 1980.9159 0.000000e+00 0.13017647 0.20376205 848
Similar Posts:
- Solved – Missing Data Mixed Effects Modelling for Repeated Measures
- Solved – Missing Data Mixed Effects Modelling for Repeated Measures
- Solved – Missing values for multiple columns
- Solved – Diagnosing why MICE is crashing R when attempting to impute multilevel data
- Solved – Fast missing data imputation in R for big data that is more sophisticated than simply imputing the means