Solved – Difference between non-informative and improper Priors

I wonder what is the difference between these two kind of priors:

  • Non-informative
  • Improper

Improper priors are $sigma$-finite non-negative measures $text{d}pi$ on the parameter space $Theta$ such that$$int_Theta text{d}pi(theta) = +infty$$As such they generalise the notion of a prior distribution, which is a probability distribution on the parameter space $Theta$ such that$$int_Theta text{d}pi(theta) =1$$They are useful in several ways to characterise

  1. the set of limits of proper Bayesian procedures,which are not all proper Bayesian procedures;
  2. frequentist optimal procedures as in (admissibility) complete class theorems such as Wald's;
  3. frequentist best invariant estimators (since they can be expressed as Bayes estimates under the corresponding right Haar measure, usually improper);
  4. priors derived from the shape of the likelihood function, such as non-informative priors (e.g., Jeffreys').

Because they do not integrate to a finite number, they do not allow for a probabilistic interpretation but nonetheless can be used in statistical inference if the marginal likelihood is finite$$int_Theta ell(theta|x)text{d}pi(theta) < +infty$$since the posterior distribution$$dfrac{ell(theta|x)text{d}pi(theta)}{int_Theta ell(theta|x)text{d}pi(theta)}$$is then well-defined. This means it can be used in exactly the same way a posterior distribution derived from a proper prior is used, to derive posterior quantities for estimation like posterior means or posterior credible intervals.

Warning: One branch of Bayesian inference does not cope very well with improper priors, namely when testing sharp hypotheses. Indeed those hypotheses require the construction of two prior distributions, one under the null and one under the alternative, that are orthogonal. If one of these priors is improper, it cannot be normalised and the resulting Bayes factor is undetermined.

In Bayesian decision theory, when seeking an optimal decision procedure $delta$ under the loss function $L(d,theta)$ an improper prior $text{d}pi$ is useful in cases when the minimisation problem $$arg min_d int_Theta L(d,theta)ell(theta|x)text{d}pi(theta)$$ allows for a non-trivial solution (even when the posterior distribution is not defined). The reason for this distinction is that the decision only depends on the product $L(d,theta)text{d}pi(theta)$, which means that it is invariant under changes of the prior by multiplicative terms $varpi(theta)$ provided the loss function is divided by the same multiplicative terms $varpi(theta)$,$$L(d,theta)text{d}pi(theta)=dfrac{L(d,theta)}{varpi(theta)}timesvarpi(theta)text{d}pi(theta)$$

Non-informative priors are classes of (proper or improper) prior distributions that are determined in terms of a certain informational criterion that relates to the likelihood function, like

  1. Laplace's insufficient reason flat prior;
  2. Jeffreys (1939) invariant priors;
  3. maximum entropy (or MaxEnt) priors (Jaynes, 1957);
  4. minimum description length priors (Rissanen, 1987; Grünwald, 2005);
  5. reference priors (Bernardo, 1979, 1781; Berger & Bernardo, 1992; Bernardo & Sun, 2012)
  6. probability matching priors (Welsh & Peers, 1963; Scricciolo, ‎1999; Datta, 2005)

and further classes, some of which are described in Kass & Wasserman (1995). The name non-informative is a misnomer in that no prior is ever completely non-informative. See my discussion on this forum. Or Larry Wasserman's diatribe. (Non-informative priors are most often improper.)

Similar Posts:

Rate this post

Leave a Comment