Solved – EM algorithm to impute missing value for one variable

This is from Robert Hogg's Introduction to Mathematical Statistics 6th, exercise 6.6.5. p366, It says,
Suppose $X_1$, $X_2$, $X_{n1}$, are a random sample from a $N(theta,1)$ distribution. Suppose $Z_1$, $Z_2$,… $Z_{n2}$, are missing observations. Show that the first step EM estimate is:


where $hattheta^{(0)}$ is an initial estimate of $theta$ and $n=n_1+n_2$. Note that if $hattheta^{(0)}=bar{x}$ then $hattheta^{(k)}=bar{x}$ for all k.

I can solve this problem and get $hattheta=bar{x}$.

My question is then what are the imputed data? Should they are:

$x_1$, $x_2$, …$x_{n1}$, $bar{x}$, $bar{x}$,…, $bar{x}$ (with $n_2$ $bar{x}$s).

In fact, this question is related a recent published paper in Lancet.

Prestmo, A., et al. (2015). "Comprehensive geriatric care for patients with hip fractures: a prospective, randomised, controlled trial." Lancet 385(9978): 1623-1633.

In the statistical analysis the authors stat that:

We used single imputation with the expectation
maximation algorithm for individual missing items on
questionnaires and performance tests, with scores from
the same timepoint as predictors.

Should the authors tell readers that what kind of distributions they assumed for the single imputation? otherwise there can be many different results for imputed data (from different distributions) I think.

Thank you very much.

Without more details about what they assumed, you couldn't reproduce what they did, so in that sense at least (i.e. for their work to be reproducible) then you'd need to know information like the assumed model, yes.

Whether that constitutes a "should" is really dependent on what normative criteria we're applying (are we addressing the editorial standards of the journal as our criterion for 'should', for example?).

From your quote it sounds like they might have applied some kind of regression model for their imputation, but that's not very precise.

Similar Posts:

Rate this post

Leave a Comment