# Solved – EM algorithm to impute missing value for one variable

This is from Robert Hogg's Introduction to Mathematical Statistics 6th, exercise 6.6.5. p366, It says,
Suppose \$X_1\$, \$X_2\$, \$X_{n1}\$, are a random sample from a \$N(theta,1)\$ distribution. Suppose \$Z_1\$, \$Z_2\$,… \$Z_{n2}\$, are missing observations. Show that the first step EM estimate is:

\$hattheta^{(1)}=frac{n_1bar{x}+n_2hattheta^{(0)}}{n}\$

where \$hattheta^{(0)}\$ is an initial estimate of \$theta\$ and \$n=n_1+n_2\$. Note that if \$hattheta^{(0)}=bar{x}\$ then \$hattheta^{(k)}=bar{x}\$ for all k.

I can solve this problem and get \$hattheta=bar{x}\$.

My question is then what are the imputed data? Should they are:

\$x_1\$, \$x_2\$, …\$x_{n1}\$, \$bar{x}\$, \$bar{x}\$,…, \$bar{x}\$ (with \$n_2\$ \$bar{x}\$s).

In fact, this question is related a recent published paper in Lancet.

Prestmo, A., et al. (2015). "Comprehensive geriatric care for patients with hip fractures: a prospective, randomised, controlled trial." Lancet 385(9978): 1623-1633.

In the statistical analysis the authors stat that:

We used single imputation with the expectation
maximation algorithm for individual missing items on
questionnaires and performance tests, with scores from
the same timepoint as predictors.

Should the authors tell readers that what kind of distributions they assumed for the single imputation? otherwise there can be many different results for imputed data (from different distributions) I think.

Thank you very much.

Contents