What is the significance of finding the posterior mean, posterior mode and posterior variance in Dirichlet – multinomial conjugate pair Bayesian estimation? Are all of them equally important while estimating a posterior probability? Which should be chosen out of the three?

**Contents**hide

#### Best Answer

I will try to explain this in general not focusing on the Dirichlet-Multinomial case and based on some Decision theory notions. The Bayes estimator $hat{theta}$ is the rule that minimizes the expected posterior loss $$ mathbb{E}_{theta|x}[L(theta,hat{theta})]= int_{-infty}^{infty}L(theta,hat{theta}) pi(theta|x)dtheta $$

where $pi(theta|x)$ is the posterior distribution and $L(theta,hat{theta})$ is the loss function. The loss function measures how much we "pay" when we choose an "action" $hat{theta}$ and the true value is $theta.$ For example the quadratic loss function is given by $$ L(theta,hat{theta}) = (hat{theta}-theta)^2. $$ So if you choose as a point estimator the posterior mean $hat{theta}=mathbb{E}_{theta|x}theta$ this minimizes the expected posterior loss when the quadratic loss function is used.

The median of the posterior distribution minimizes the expected posterior loss when $$ L(theta,hat{theta})=c|hat{theta}-theta|, quad c>0, $$ the absolute loss function.

The choice of the estimator depends on the application. For example if one has a multimodal posterior distribution it is not reasonable to assume that the posterior mean is an appropriate estimate and should take the posterior mode. That is the assumption is that the $0-1$ loss function has been used $$ L(theta,hat{theta}) = begin{cases} 0,quad |hat{theta}-theta|<0\ 1,quad |hat{theta}-theta|geq 0end{cases}. $$

### Similar Posts:

- Solved – Under the 0-1 loss function, the Bayesian estimator is the mode of the posterior distribution
- Solved – Bayes estimate with weighted square error loss
- Solved – Estimation of parameters as a mode of posterior distribution
- Solved – Why does the conditional expectation minimize L2 loss?
- Solved – Why does the conditional expectation minimize L2 loss?