# Solved – MCMC and data augmentation

I have been looking at an MCMC data augmentation question; the general form of the question is as follows:

Suppose data gathered on a process suggests \$X_{i} sim text{Pois}(lambda)\$ and a prior for the rate parameter is suggested as \$lambda sim text{Exp}(lambda_{0})\$.
The data is recorded and presented in a typical form (i.e. the number of occurrences of each value for \$X_{i}\$ from \$0\$ to \$n\$), however, the data gathered does not discriminate in instances where \$X_{i} leq 1\$ (i.e. all occurrences where \$X_{i} = 0\$ and \$X_{i} = 1\$ are grouped into one category).

Given the data, the likelihood and the prior described above, the question asks for:

• The posterior form of \$lambda\$,

• The number of occurrences where \$X_{i} = 0\$.

I'm not really sure how to answer this question, but I am aware that Gibbs Sampling can be used in data augmentation.
Does anybody have any information on how this could be done?

EDIT:

I should specify that it's primarily the second part (the number of occurrences where \$X_{i} = 0\$) that I'm unsure about. For the first part (the posterior form of \$lambda\$), given the likelihood and the prior suggested, I have reasoned (although I'm happy to be corrected):

Given:

\$\$ pi(lambda|vec{x}) propto p(vec{x}|lambda) times p(lambda) \$\$

So, for the model given above:

\$\$ pi(lambda|vec{x}) = frac{lambda^{sum_{i=1}^{n}x_{i}}}{sum_{i=1}^{n}x_{i}!}e^{-nlambda} times lambda_{0}e^{-lambda lambda_{0}} \$\$

Simplifying yields:

\$\$ pi(lambda|vec{x}) = frac{lambda^{sum_{i=1}^{n}x_{i}}}{sum_{i=1}^{n}x_{i}!}e^{-lambda(n+lambda_{0})}lambda_{0} \$\$

which is proportional to (and hence the posterior form is given by):

\$\$ pi(lambda|vec{x}) propto lambda^{sum_{i=1}^{n}x_{i}}e^{-lambda(n+lambda_{0})}lambda_{0} \$\$

Contents

Your answer does not account for the fact that the observations equal to zero and to one are merged together: what you computed is the posterior for the complete Poisson data, \$(X_1,ldots,X_n)\$, rather than for the aggregated or merged data, \$(X_1^*,ldots,X^*_n)\$.

If we take the convention that cases when the observation \$X_i^*=1\$ correspond to \$X_i=1\$ or \$X_i=0\$ and the observation \$X_i^*=k>1\$ to \$X_i=k\$, the density of the observed vector \$(X_1^*,ldots,X^*_n)\$ is (after some algebra and factorisation) \$\$ pi(lambda|x_1^*,ldots,x^*_n) propto lambda^{sum_{i=1}^n x_i^*mathbb{I}(x_i^*>1)} exp{-lambda(lambda_0+n)} times {1+lambda}^{n_1} \$\$ where \$n_1\$ is the number of times the \$x_i^*\$'s are equal to one. The last term between brackets above is the probability to get 0 or 1 in a Poisson draw.

So this is your true/observed posterior. From there, you can implement a Gibbs sampler by

1. Generating the "missing observations" given \$lambda\$ and the observations, namely simulating \$p(x_i|lambda,x_i^*=1)\$, which is given by \$\$ mathbb{P}(x_i=0|lambda,x_i^*=1)=1-mathbb{P}(x_i=1|lambda,x_i^*=1)=dfrac{1}{1+lambda},. \$\$
2. Generating \$lambda\$ given the "completed data", which amounts to \$\$ lambda|x_1,ldots,x_n sim mathcal{G}(sum_i x_i + 1,n+lambda_0) \$\$ as you already computed.

(If you want more details, Example 9.7, p.346, in my Monte Carlo Statistical Methods book with George Casella covers exactly this setting.)

Rate this post