Solved – Diagnosing why MICE is crashing R when attempting to impute multilevel data

I have never had problems with R crashing before.

I am using the mice package (mice 2.13) to perform multiple imputations. The code works fine on some subsets of the data, but when I run it on other subsets, R crashes (not immediately – after some time). From the output in R just before it crashes, I believe it is using the 2l.pan method of imputation (from the pan package) I have run update.packages() already.

How can I diagnose this problem ?

Problem signature:   Problem Event Name:   APPCRASH   Application Name: Rgui.exe   Application Version:  2.151.59607.0   Application Timestamp:    4fe47a63   Fault Module Name:    R.dll   Fault Module Version: 2.151.59607.0   Fault Module Timestamp:   4fe47a4e   Exception Code:   c0000005   Exception Offset: 0000000000032ec8   OS Version:   6.1.7601.2.1.0.256.4   Locale ID:    2057   Additional Information 1: 7782   Additional Information 2: 77823beb5887f451c3dd7ae4fe931995   Additional Information 3: 4491   Additional Information 4: 4491b41bf90894717964f5eef2cccd84 

Update

I have managed to create a reproducible example, with data:

require(foreign) require(mice) require(pan)  dt.fail <- read.csv("http://goo.gl/pg8um") dt.fail$X <- NULL  dt.fail$out <- as.factor(dt.fail$out ) dt.fail$grp<- as.factor(dt.fail$grp) dt.fail$v1<- as.factor(dt.fail$v1) dt.fail$v2<- as.factor(dt.fail$v2) dt.fail$v3 <- as.factor(dt.fail$v3) dt.fail$v7<- as.factor(dt.fail$v7) dt.fail$v8 <- as.factor(dt.fail$v8) dt.fail$v9 <- as.factor(dt.fail$v9) dt.fail$v11 <- as.factor(dt.fail$v11) dt.fail$v12 <- as.factor(dt.fail$v12)   PredMatrix <- quickpred(dt.fail) PredMatrix['CTP',] <- c(1,-2,0,0,0,0,0,0,0,0,1,0,1,1,0,2)    impute = mice( data=dt.fail,      m = 1,      maxit = 1,     imputationMethod = c(     "logreg",   # out     "",      # grp   ----> cluster grouping factor     "pmm",  # v1     "polyreg",  # v2     "logreg",   # v3     "pmm",  # v4     "logreg",   # v5     "logreg",   # v6     "polyreg",  # v7 ----> auxilliary     "polyreg",  # v8 ----> auxilliary     "polyreg",  # v9 ----> auxilliary     "polyreg",  # v10 ----> auxilliary     "",     # v11 ----> complete     "",     # v12 ----> complete     "2l.pan",   # CTP ----> multilevel imputation     ""),        # const ----> needed for multilevel impuitation  predictorMatrix = PredMatrix, seed = 101 ) 

And for completeness, here is the predictor matrix I was using:

    .     out grp v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 CTP const out     0   0  0  0  0  0  1  1  0  0  0   0   0   0   0     0 grp     0   0  0  0  0  0  0  0  0  0  0   0   0   1   0     0 v1      0   0  0  0  0  0  0  0  0  0  0   0   1   1   0     0 v2      0   0  0  0  0  1  1  1  0  1  0   0   1   1   1     0 v3      0   0  0  0  0  1  1  1  0  1  1   0   1   1   1     0 v4      0   0  0  1  1  0  1  1  0  1  1   0   1   1   1     0 v5      1   1  0  0  0  0  0  1  0  1  0   0   1   0   0     0 v6      1   1  0  1  0  1  1  0  0  1  0   0   1   0   0     0 v7      0   0  0  0  0  0  1  1  0  1  0   0   0   1   0     0 v8      0   0  0  0  0  0  1  1  0  0  0   0   1   1   0     0 v9      0   0  0  0  1  1  1  1  0  1  0   0   1   1   1     0 v10     0   0  0  0  0  0  1  1  0  1  0   0   1   1   0     0 v11     0   0  0  0  0  0  0  0  0  0  0   0   0   0   0     0 v12     0   0  0  0  0  0  0  0  0  0  0   0   0   0   0     0 CTP     1  -2  0  0  0  0  0  0  0  0  1   0   1   1   0     2 const   0   0  0  0  0  0  0  0  0  0  0   0   0   0   0     0 

I occasionally have problems with the 2l methods for large data, but have never seen R itself crash on it. My guess would be that they are related to sparse data (very small clusters). How many predictors do you have relative to cluster size?

Some suggestions:

In your data, you have several covariates that have incomplete data but that are not imputed. Please check whether mice removes them before imputation by setting maxit = 0 and inspects imp$log. If you want to use these as predictors, you should specify an imputation method for them.

The mice package does not use any own fortran or C code, but pan may (I don't know). If you are really determined to find the source of the problem, I suggest that you consult the book by Matloff, which contains chapter on advanced debugging techniques.

The obvious other route is to try to simplify the model. Remove superfluous predictors, use a flat file (e.g. pmm) with cluster allocation as a fixed factor, and check whether the intra-class correlations of the observed and impute data are similar.

The intercept term is automatically added by `mice.impute.2l.pan', so you do not need that.

Hope this helps.

Similar Posts:

Rate this post

Leave a Comment