I have never had problems with R crashing before.
I am using the mice
package (mice 2.13) to perform multiple imputations. The code works fine on some subsets of the data, but when I run it on other subsets, R crashes (not immediately – after some time). From the output in R just before it crashes, I believe it is using the 2l.pan
method of imputation (from the pan
package) I have run update.packages() already.
How can I diagnose this problem ?
Problem signature: Problem Event Name: APPCRASH Application Name: Rgui.exe Application Version: 2.151.59607.0 Application Timestamp: 4fe47a63 Fault Module Name: R.dll Fault Module Version: 2.151.59607.0 Fault Module Timestamp: 4fe47a4e Exception Code: c0000005 Exception Offset: 0000000000032ec8 OS Version: 6.1.7601.2.1.0.256.4 Locale ID: 2057 Additional Information 1: 7782 Additional Information 2: 77823beb5887f451c3dd7ae4fe931995 Additional Information 3: 4491 Additional Information 4: 4491b41bf90894717964f5eef2cccd84
Update
I have managed to create a reproducible example, with data:
require(foreign) require(mice) require(pan) dt.fail <- read.csv("http://goo.gl/pg8um") dt.fail$X <- NULL dt.fail$out <- as.factor(dt.fail$out ) dt.fail$grp<- as.factor(dt.fail$grp) dt.fail$v1<- as.factor(dt.fail$v1) dt.fail$v2<- as.factor(dt.fail$v2) dt.fail$v3 <- as.factor(dt.fail$v3) dt.fail$v7<- as.factor(dt.fail$v7) dt.fail$v8 <- as.factor(dt.fail$v8) dt.fail$v9 <- as.factor(dt.fail$v9) dt.fail$v11 <- as.factor(dt.fail$v11) dt.fail$v12 <- as.factor(dt.fail$v12) PredMatrix <- quickpred(dt.fail) PredMatrix['CTP',] <- c(1,-2,0,0,0,0,0,0,0,0,1,0,1,1,0,2) impute = mice( data=dt.fail, m = 1, maxit = 1, imputationMethod = c( "logreg", # out "", # grp ----> cluster grouping factor "pmm", # v1 "polyreg", # v2 "logreg", # v3 "pmm", # v4 "logreg", # v5 "logreg", # v6 "polyreg", # v7 ----> auxilliary "polyreg", # v8 ----> auxilliary "polyreg", # v9 ----> auxilliary "polyreg", # v10 ----> auxilliary "", # v11 ----> complete "", # v12 ----> complete "2l.pan", # CTP ----> multilevel imputation ""), # const ----> needed for multilevel impuitation predictorMatrix = PredMatrix, seed = 101 )
And for completeness, here is the predictor matrix I was using:
. out grp v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 CTP const out 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 grp 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 v1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 v2 0 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 v3 0 0 0 0 0 1 1 1 0 1 1 0 1 1 1 0 v4 0 0 0 1 1 0 1 1 0 1 1 0 1 1 1 0 v5 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 v6 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 v7 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 v8 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 v9 0 0 0 0 1 1 1 1 0 1 0 0 1 1 1 0 v10 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 v11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 v12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CTP 1 -2 0 0 0 0 0 0 0 0 1 0 1 1 0 2 const 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Best Answer
I occasionally have problems with the 2l methods for large data, but have never seen R itself crash on it. My guess would be that they are related to sparse data (very small clusters). How many predictors do you have relative to cluster size?
Some suggestions:
In your data, you have several covariates that have incomplete data but that are not imputed. Please check whether mice
removes them before imputation by setting maxit = 0
and inspects imp$log
. If you want to use these as predictors, you should specify an imputation method for them.
The mice
package does not use any own fortran
or C
code, but pan
may (I don't know). If you are really determined to find the source of the problem, I suggest that you consult the book by Matloff, which contains chapter on advanced debugging techniques.
The obvious other route is to try to simplify the model. Remove superfluous predictors, use a flat file (e.g. pmm
) with cluster allocation as a fixed factor, and check whether the intra-class correlations of the observed and impute data are similar.
The intercept term is automatically added by `mice.impute.2l.pan', so you do not need that.
Hope this helps.
Similar Posts:
- Solved – Why does MICE fail to impute multilevel data with 2l.norm and 2l.pan
- Solved – Missing values in a large data set
- Solved – Simultaneous imputation of multiple binary variables in R
- Solved – How to use restricted cubic splines with the R mice imputation package
- Solved – Missing values for multiple columns