I was referring to this http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation. Here it is mentioned that estimating the parameters $theta$ is actually finding the mode of the posterior distribution. I didn't get how this is true.

**Contents**hide

#### Best Answer

It is high time to convert my comment to an answer.

Bayesian inference has a theoretical fundation, namely *Bayesian decision theory*. The key tools of Bayesian decision theory are the *loss functions*.

A Bayesian estimate is derived as follows from the Bayesian decision theory. Define a "distance" $d(theta,theta')$ on the parameter space (I will come back to this point in the last paragraph of my answer). This is a loss function for the estimation problem: the Bayesian estimate associated to this loss function is the value of $theta_0$ minimizing the *expected posterior loss* $$int d(theta_0,theta) pi(theta mid x) mathrm{d}theta$$ where, in accordance to the usual notations, $pi(thetamid x)$ denotes the posterior distribution.

When $d(theta,theta')={(theta-theta')}^2$ is the squared loss function then the corresponding Bayesian estimate turns out to be the mean of the posterior distribution. When $d(theta,theta')=|theta-theta'|$ is the absolute deviation loss function, then the Bayesian estimate is the median of the posterior dsitribution. I am not sure to well remember what is the loss function yielding the posterior mode as the Bayesian estimate; see the excellent book *The Bayesian Choice, by Christian Robert*. Then the choice of the Bayesian is driven by the choice of the loss function, in other words by the question: "what do you want to minimize ?".

Now let me note that choosing a distance $d(theta,theta')$ between the possible values of the parameter does not sound to be a sensible idea. For example it is clear that the difference between the Poisson distributions ${cal P}(theta)$ and ${cal P}(theta')$ should not be as pronounced for $theta=1000$ and $theta'=1010$ than for $theta=1$ and $theta'=11$. Thus, it should be more sensible to use a "distance" between the sampling distributions $p_theta$ and $p_{theta'}$ rather than between the parameters $theta$ and $theta'$. In my paper I studied the Bayesian inference based on such a "distance" (the intrinsic discrepancy loss function) for a very simple model. Other possible choices of an "intrinsic loss function" are given in Robert's paper which is cited in the references.

### Similar Posts:

- Solved – Posterior mode, posterior mean and posterior variance of a posterior distribution of dirichlet form
- Solved – Is Maximum Likelihood Estimation the median?
- Solved – When do MAP inference and full Bayesian Inference give the same solution and why
- Solved – Different definitions of Bayes risk
- Solved – Different definitions of Bayes risk