Say I have a bunch of data from a Poisson distribution and I want to find out my posterior i.e. I'm data fitting:
$p(lambda | X) sim p(X|lambda)p(lambda)$
where $p(X|lambda) = frac{exp(-lambda)lambda^x}{x!}$ so that my log-likelihood looks like:
$log mathcal{L}(lambda|X) sim x loglambda – lambda$
Now as $lambda > 0$, I transform my coordinates to be $alpha = log lambda$. My new distribution looks like:
$p(X|alpha) = frac{exp(-exp(alpha))exp(alpha x)}{x!}cdot bf{exp(alpha)}$
where the final $exp(alpha)$ comes from the Jacobian of the transformation.
This makes:
$log mathcal{L}(lambda|X) sim -exp(alpha) + alpha x + bf{alpha}$
where the final $bf{alpha}$ in the new log-likelihood is from the earlier Jacobian.
The problem I'm having is that if I include that new $alpha$ then my Metropolis-Hastings MCMC gives me a result that is incorrect. If I use a log-likehood that excludes it:
$log mathcal{L}(lambda|X) sim -exp(alpha) + alpha x$
then I get correct results.
My question is:
Why does the Metropolis-Hastings algorithm not care about the Jacobian?
Best Answer
You do not need the $alpha$ since it is a parameter. The change of variables formula applies to the variable with respect to which you are "integrating". It is $x$ in your case. So MH is right to demand that you remove the excess factor.
So what you really have is:
$$ p(X|alpha) = frac{exp(-exp(alpha))exp(alpha x)}{x!} $$
had you applied some transformation to your $x$ variable – then the change of variables foremula should be used.
EDIT To understand what's going on, think of a normal RV $X sim mathcal{N}(mu ,sigma^2)$. So $p(X|mu,sigma^2)$ is the density. If you transform $mu$ with any transformation $f$, you get the new variable is $Y sim mathcal{N}(f(mu) ,sigma^2)$ and no jacobian is necessary. I hope you agree (if not, I'll have to write more in tex…).
If you want $mathcal{P}(Xin A|alpha)$ you'd integrate $x$ and keep $alpha$ fixed – that's what I mean when I say "integrate". Probability is all about integration, after all.
So in the end you have $p(x|alpha)$ with no extra jacobian term. Then proceed as usual with bayes' rule etc and you'll get the "right" density.