I am interested to estimate the density of a continuous random variable $X$. One way of doing this that I learnt is the use of Kernel Density Estimation.

But now I am interested in a Bayesian approach that along the following lines. I initially believe that $X$ follows a distribution $F$. I take $n$ readings of $X$. Is there some approach to update $F$ based on my new readings?

I know I sound like I am contradicting myself: If I believe solely in $F$ as my prior distribution, then no data should convince me otherwise. However, suppose $F$ were $Unif[0,1]$ and my data points were like $(0.3, 0.5, 0.9, 1.7)$. Seeing $1.7$, I obviously cannot stick to my prior, but how should I update it?

**Update:** Based on the suggestions in the comments, I have started looking at Dirichlet process. Let me use the following notations:

$ G sim DP(alpha,H)\

theta_i | G sim G\

x_i | theta_i sim N(theta_i,sigma^2)$

After framing my original problem in this language, I guess I am interested in the following: $theta_{n+1} | x_1,…,x_n$. How does one do this?

In this set of notes (page 2), the author did an example of $theta_{n+1} | theta_1,…,theta_n$ (Polya Urn Scheme). I am not sure if this is relevant.

**Update 2:** I also wish to ask (after seeing the notes): how do people choose $alpha$ for the DP? It seems like a random choice. In addition, how do people choose a prior $H$ for DP? Should I just use a prior for $theta$ as my prior for $H$?

**Contents**hide

#### Best Answer

Since you want a bayesian approach, you need to assume some prior knowledge about the thing you want to estimate. This will be in the form of a distribution.

Now, there's the issue that this is now a distribution over distributions. However, this is no problem if you assume that the candidate distributions come from some parameterized class of distributions.

For example, if you want to assume the data is gaussian distributed with unknown mean but known variance, then all you need is a prior over the mean.

MAP estimation of the unknown parameter (call it $theta$) could proceed by assuming that all the observations / data points are conditionally independent given the unknown parameter. Then, the MAP estimate is

$hat{theta} = arg max_theta ( text{Pr}[x_1,x_2,…,x_n,theta] )$,

where

$ text{Pr}[x_1,x_2,…,x_n,theta] = text{Pr}[x_1,x_2,…,x_n | theta] text{Pr}[theta] = text{Pr}[theta] prod_{i=1}^n text{Pr}[x_i | theta]$.

It should be noted that there are particular combinations of the prior probability $text{Pr}[theta]$ and the candidate distributions $text{Pr}[x | theta]$ that give rise to easy (closed form) updates as more data points are received.