I am trying to use a Metropolis-within-Gibbs type algorithm to sample $theta$ and $x$ from the following model. Starting with Bayes theorem I can write:
$$
P(theta, x | y) = frac{P(y | x, theta) P(theta)}{P(y)}
$$
I can easily draw samples from $P(theta)$ and $P(x | theta)$, so I factor $P(y | x, theta)$ into $P(y | x) P(x | theta)$.
Bayes theorem again is now,
$$
P(theta, x | y) = frac{P(y | x) P(x | theta) P(theta)} {P(y)}
$$
- $P(y | x)$ is multivariate Gaussian.
- $P(x | theta)$ is multivariate t.
- $P(theta)$ is uniform.
My sampling steps:
Start with some value of $theta$, say, $theta_1$.
Gibbs step: Draw a sample $x_1$ from $P(x | theta_1)$.
Plug my $x_1$ into $P(y | x)$ and evaluate (y was observed data).
Metropolis step: Accept $theta$, $x_1$, if $P(y | x_1) > P(y | x_0)$. If not, accept $theta$, $x_1$ with probability $a = frac{P(y | x_1)}{P(y | x_0)}$.
Go back to step 2, using $theta_1$ if it was accepted, otherwise, draw a new $theta$ from the proposal, then go back to step 2.
The problem is that $theta$ gets stuck. Sometimes I will get a draw of $x$ from $P(x | theta)$, where that $x$ will be nearly identical to y, making result from $P(y | x)$ exceptionally high. After that happens, essentially all my steps are rejected. $a = frac{P(y | x_1)}{P(y | x_0)}$ for pretty much all subsequent $theta$ steps is a very small number.
Here is why I think this happens: The likelihood $P(y | x)$ is very sharply peaked. On the other hand the distribution $P(x | theta)$ is very broad in comparison. When $theta$ changes a small amount, $P(x | theta)$ is essentially unchanged.
Would a different sampling strategy work better in such a situation? Let me know if further info is required.
Best Answer
In step 4, you don't have to reject the proposal $x,theta$ every time its new likelihood is lower; if you do so, you are doing a sort of optimization instead of sampling from the posterior distribution.
Instead, if the proposal is worse then you still accept it with an acceptance probability $a$.
With pure Gibbs sampling, the general strategy to sample this would be:
Gibbs
Iteratively sample: begin{align} p(x | theta, y) &propto p(y | x) p(x |theta)\ p(theta | x, y) &propto p(x | theta) p(theta) end{align}
Gibbs with Metropolis steps for non-conjugate cases:
If you some of the conditionals above is not a familar distribution (because you are multiplying non-conjugates; this is your case) you can sample with Metropolis Hastings:
From the current $x$, generate some proposal, e.g.: $$ x^* sim mathcal{N(x, sigma)} $$
Accept $x^*$ with probability [1]: $$ a = min left(1, frac{p(x^*)}{p(x)}right) = min left(1, frac{p(x^* | theta, y)}{p(x | theta, y)}right) $$
[1] If the proposal distribution wasn't symmetric then there is another multiplying factor.
Appendix: $$ p(theta | x, y) = frac{p(y|x)p(x| theta)p(theta)} {int p(y|x)p(x| theta)p(theta) text{d}theta}= frac{p(x| theta)p(theta)} {int p(x| theta)p(theta) text{d}theta} propto p(x| theta)p(theta) $$