I'm trying to understand Jeffreys prior. One application is for 'scale' variables like the standard deviation $sigma$ (or its square, the variance $sigma^2$) of Gaussian distributions. It is often said that using a uniform prior over $sigma$ is not really non-informative and instead one should either:
Use instead $ln sigma$ as the free parameter, with a uniform prior (this is often called a log-uniform prior)
Or keep using $sigma$ as the free parameter but use $1/sigma$ as the prior (which is not uniform).
Why are the above two methods/priors equivalent? I feel it has something to do with the fact that the derivative of ln $sigma$ is $1/sigma$ but I can't take the next step.
Also, why does this even matter, in simple language with minimal jargon? I see all these complicated explanations online involving the Fisher information matrix but in the end all I see is that the above log-uniform or $1/sigma$ priors preferentially weight lower values of $sigma$ more highly. Why? If possible, a simple analytic example or python snippet would be very helpful.
Best Answer
When transforming a uniform distribution on $log(sigma)$ to a distribution on $sigma$ you need to take into account the Jacobian of the transformation. This corresponds, as you correctly intuited, to $1/sigma$.
Writing this a little more clearly, let $X=log(sigma)$ and the transformation we're after is $T(X)=sigma=e^{X}=Y$, which has inverse transformation $T^{-1}(Y)=log(Y)$. The jacobian is then $|frac{partial X}{partial Y}|=1/Y$. So since $p(X)propto 1$, we have that the induced density for $sigma$ is the $p(Y)=|frac{partial X}{partial Y}|p(log(Y))propto1/Y$.
Similar Posts:
- Solved – Bayesian prior choice for multivariate Gaussian distribution
- Solved – the relation behind Jeffreys Priors and a variance stabilizing transformation
- Solved – Why are flat priors said to be proportional to a constant
- Solved – Why use ${1/sigma^2}$ as a prior for $sigma^2$
- Solved – What does it mean for the uniform prior?