This is from Robert Hogg's Introduction to Mathematical Statistics 6th, exercise 6.6.5. p366, It says,

Suppose $X_1$, $X_2$, $X_{n1}$, are a random sample from a $N(theta,1)$ distribution. Suppose $Z_1$, $Z_2$,… $Z_{n2}$, are missing observations. Show that the first step EM estimate is:

$hattheta^{(1)}=frac{n_1bar{x}+n_2hattheta^{(0)}}{n}$

where $hattheta^{(0)}$ is an initial estimate of $theta$ and $n=n_1+n_2$. Note that if $hattheta^{(0)}=bar{x}$ then $hattheta^{(k)}=bar{x}$ for all k.

I can solve this problem and get $hattheta=bar{x}$.

My question is then what are the imputed data? Should they are:

$x_1$, $x_2$, …$x_{n1}$, $bar{x}$, $bar{x}$,…, $bar{x}$ (with $n_2$ $bar{x}$s).

In fact, this question is related a recent published paper in Lancet.

Prestmo, A., et al. (2015). "Comprehensive geriatric care for patients with hip fractures: a prospective, randomised, controlled trial." Lancet 385(9978): 1623-1633.

In the statistical analysis the authors stat that:

*We used single imputation with the expectation maximation algorithm for individual missing items on questionnaires and performance tests, with scores from the same timepoint as predictors.*

Should the authors tell readers that what kind of distributions they assumed for the single imputation? otherwise there can be many different results for imputed data (from different distributions) I think.

Thank you very much.

**Contents**hide

#### Best Answer

Without more details about what they assumed, you couldn't reproduce what they did, so in that sense at least (i.e. for their work to be reproducible) then you'd need to know information like the assumed model, yes.

Whether that constitutes a "should" is really dependent on what normative criteria we're applying (are we addressing the editorial standards of the journal as our criterion for 'should', for example?).

From your quote it sounds like they might have applied some kind of regression model for their imputation, but that's not very precise.

### Similar Posts:

- Solved – Optimal scaling / CATREG (categorical regression) for imputed data
- Solved – How to know which imputation is best for impute the dataset from Multiple imputation by using mice
- Solved – Missing data – Regression imputation
- Solved – Negative imputed values
- Solved – Cross Validation and Multiple Imputation for Missing Data