Is the Mixture of Gaussians model (an example of latent class analysis) gauranteed to converge on a viable solution even on Unimodal data using the Expectation Maximization algorithm to estimate the classes?

EM guarantees it will find a local optimum, but what does that necessarily mean with regard to the parameters?

E.g. If I have (singlevariate or multivariate) gaussian data, and I try to fit a mixture model of two Gaussians on it, what is theoretically supposed to happen?

**Contents**hide

#### Best Answer

Assuming no restrictions on the variables $mu_k$ and $sigma_k$ (the means and standard deviations of the separate components of the mixture model), the EM algorithm may not converge on a local maximum, as the likelihood function is unbounded in this case.

To illustrate, suppose we observe data points $x_1, …, x_n$ and we fit a two component Gaussian mixture model. Then the log likelihood can be written as

$ displaystyle sum_{i = 1}^n logleft( p_1 times f(x_i | mu_1, sigma_1) + p_2 times f(x_i|mu_2, sigma_2)right)$

where $f(x|mu, sigma)$ is the pdf of the normal distribution with mean = $mu$ and standard deviation = $sigma$. Consider now the case if we set $mu_1 = bar x$ (mean of observed data) and $sigma_1 = hat sigma$ (standard deviation of observed data), with $p_1 = p_2 = 0.5$ (side note: these values are somewhat arbitrary, used only getting a lower bound). Then the contribution to the likelihood function for each point will be at least

$log(0.5 times f(x_i|bar x, hat sigma) )$

i.e. the contribution for each observation can be bounded from below for all values of $mu_2$ and $sigma_2$. Then consider if we set $mu_2 = x_1$. As $sigma_2 rightarrow 0$, $f(x_1 | x_1, sigma_2) rightarrow infty$. Of course, this implies

$log(0.5 times f(x_1| x_1, sigma_2) + 0.5 times f(x_1 | bar x, hat sigma)) rightarrow infty$

as well. Since the contributions of all the other observations are bounded from below by $log(0.5 times f(x_i|bar x, hat sigma) )$, this implies that the likelihood function is unbounded.

Since the likelihood function is unbounded, if the EM algorithm approaches this infinite "peak", it will not converge. However, this likelihood typically *does* have non-degenerate local maximum, assuming $n$ is sufficiently large relative to $k$ (number of components). So the EM algorithm may find these non-degenerate local maximum, which are undoubtably better estimators than the degenerate solution. In fact, if $n$ is large relative to $k$, it is very likely to find a non-degenerate local max rather than to approach the degenerate solution.

### Similar Posts:

- Solved – Use gradient methods for maximum likelihood estimation of Gaussian mixture
- Solved – When does the EM for Gaussian mixture model has one of the Gaussian diminish to exactly one point and have zero variance
- Solved – To find maxima for Gaussian mixture model
- Solved – Maximum likelihood estimation for Gaussian mixture
- Solved – Initializing EM algorithm