I am having a little trouble understanding the concept and derivation of the likelihood of truncated data.
For example, if I want to find the likelihood function based on a sample from a distribution, but when taking a sample from the distribution, I observe the truncated values (where there is a cut-off of $M$, i.e. any $x_{i}>M$ is recorded as $M$):
$ x_{1}, x_{2}, M, x_{3}, M, x_{4}, x_{5}, …, x_{10}$
where the number of $M$ values is $m$. Then, the likelihood is supposedly given by:
$L(x;theta) = prod_{i=1}^{10}f(x_{i};theta)*[P(X>M)]^{m}$
I would very much appreciate an explanation/proof of why this is so, importantly why the second factor is as it is. Intuitively and mathematically if possible. Thanks very much in advance.
Best Answer
What you describe needs special treatment, it is not what we usually mean by "truncated random variables"-and what we usually mean is that the random variable does not range outside the truncated support, meaning that there is not a concentration of probability mass at the point of truncation. To contrast cases:
A) "Usual" meaning of a truncated rv
For any distribution that we truncate its support, we must "correct" its density so that it integrates to unity when integrated over the truncated support. If the variable has support in $[a,b]$, $-infty < a < b < infty$, then (pdf $f$, cdf $F$)
$$int_a^bf_X(x)dx = int_a^Mf_X(x)dx+int_M^bf_X(x)dx = int_a^Mf_X(x)dx + left[1-F_X(M)right]=1 $$
$$Rightarrow int_a^Mf_X(x)dx = F_X(M)$$
Since the LHS is the integral over the truncated support, we see that the density of the truncated r.v., call it $tilde X$, must be
$$f_{tilde X}(tilde x) = f_{X}(xmid Xle M)=f_X(x)dxcdot left[F_X(M)right]^{-1} $$ so that it integrates to unity over $[a, M]$. The middle term in the above expression makes us think of this situation (rightfully) as a form of conditioning -but not on another random variable, but on the possible values the r.v. itself can take. Here a joint density/likelihood function of a collection of $n$ truncated i.i.d r.v.'s would be $n$ times the above density, as usual.
B) Probability mass concentration
Here, which is what you describe in the question, things are different. The point $M$ concentrates all the probability mass that corresponds to the support of the variable higher than $M$. This creates a point of discontinuity in the density and makes it having two branches
$$begin{align} f_{X^*}(x^*) &= f_X(x^*) qquad x^*<M\ f_{X^*}(x^*) &= P(X^* ge M) qquad x^*ge M\ end{align}$$
Informally, the second is "like a discrete r.v." where each point in the probability mass function represents actual probabilities. Now assume that we have $n$ such i.i.d random variables, and we want to form their joint density/likelihood function. Before looking at the actual sample, what branch should we choose? We cannot make that decision so we have to somehow include both. To do this we need to use indicator functions: denote $I{x^*ge M}equiv I_{ge M}(x^*)$ the indicator function that takes the value $1$ when $x^*ge M$, and $0$ otherwise. The density of such a r.v. can be written
$$f_{X^*}(x^*) = f_X(x^*)cdot left[1-I_{ge M}(x^*)right]+P(X^* ge M)cdot I_{ge M}(x^*) $$ and therefore the joint density function of $n$ such i.i.d. variables is
$$f_{X^*}(mathbf X^*mid theta) = prod_{i=1}^nBig[f_X(x^*_i)cdot left[1-I_{ge M}(x^*_i)right]+P(X^*_i ge M)cdot I_{ge M}(x^*_i)Big]$$
Now, the above viewed as a likelihood function, the actual sample consisting of realizations of these $n$ random variables comes into play. And in this sample, some observed realizations will be lower than the threshold $M$, some equal. Denote $m$ the number of realizations in the sample that equals $M$, and $v$ all the rest, $m+v=n$. It is immediate that for the $m$ realizations, the corresponding part of the density that will remain in the likelihood will be the $P(X^*_i ge M)$ part, while for the $v$ realizations, the other part. Then
$$begin{align} L(thetamid {x_i^*;,i=1,…n})&= prod_{i=1}^vBig[f_X(x^*_i)Big]cdot prod_{j=1}^mBig[P(X^*_j ge M)Big] \& = prod_{i=1}^vBig[f_X(x^*_i)Big]cdot Big[P(X^* ge M)Big]^m\ end{align}$$