Solved – data imputation of missing values in non-normally distributed explanatory variables

I have been told that mean imputation of missing values is inappropriate when the variables underlying distribution is non-normal. my variable is contiunous (but bound at 100) and most observations are either 98 or 99 (with the odd few in the lower 90's), hence the distribution is highly skewed. how would i best input for missing values?

The typical modern approach in this sort of situation is to use some form of multiple imputation.

The general idea is that instead of imputing just one 'best' value to the missing data, you repeat an imputation process many times using the known statistics of the missing value (including considering whether the missing data might be correlated with other variables that are not missing), generating multiple distinct sets of data. Then you run your analyses on each imputed copy of the data, and finally pool those analyses together.

MICE is a popular implementation in R.

You can expect that other major statistical packages like Stata will also have functions for multiple imputation.

Similar Posts:

Rate this post

Leave a Comment