I am attempting to impute my data for missing values using the
mice function (from the mice package) in R. However everytime I run the function it freezes or lags.
I have tried running it over night, and it still does not finish. I am working with 17000 observations across 32 variables. I can't even stop the function after it starts, I have to force quit the program.
Here is my code:
imputed.data = mice(data, m = 1, diagnostics = F)
In my experience, mice is relatively slow (especially for big datasets) and unparallellizable (makes sense because of the type of algorithm). In previous examples I've been able to speed it up by specifying a
predictorMatrix so not every variable is used to predict every other variable (simplifies the intermediate model fitting). However, depending on your variables and their (known or unknown) relations, this may be unwanted.
To ensure that nothing else is the matter, you may want to run
mice on a smaller subset of you data with relatively little data missing (e.g. 200 rows with at most 5% missing data), or run it on a simulation dataset.
- Solved – Imputation using MICE: Use the train data to impute the missing test data
- Solved – Missing values for multiple columns
- Solved – How to know which imputation is best for impute the dataset from Multiple imputation by using mice
- Solved – Imputation in R: How to impute univariate data in R
- Solved – Including dependent variables in multiple imputation model when they have missing values