# Solved – Convergence of EM for Mixture of Gaussians

Is the Mixture of Gaussians model (an example of latent class analysis) gauranteed to converge on a viable solution even on Unimodal data using the Expectation Maximization algorithm to estimate the classes?

EM guarantees it will find a local optimum, but what does that necessarily mean with regard to the parameters?

E.g. If I have (singlevariate or multivariate) gaussian data, and I try to fit a mixture model of two Gaussians on it, what is theoretically supposed to happen?

Contents

Assuming no restrictions on the variables \$mu_k\$ and \$sigma_k\$ (the means and standard deviations of the separate components of the mixture model), the EM algorithm may not converge on a local maximum, as the likelihood function is unbounded in this case.

To illustrate, suppose we observe data points \$x_1, …, x_n\$ and we fit a two component Gaussian mixture model. Then the log likelihood can be written as

\$ displaystyle sum_{i = 1}^n logleft( p_1 times f(x_i | mu_1, sigma_1) + p_2 times f(x_i|mu_2, sigma_2)right)\$

where \$f(x|mu, sigma)\$ is the pdf of the normal distribution with mean = \$mu\$ and standard deviation = \$sigma\$. Consider now the case if we set \$mu_1 = bar x\$ (mean of observed data) and \$sigma_1 = hat sigma\$ (standard deviation of observed data), with \$p_1 = p_2 = 0.5\$ (side note: these values are somewhat arbitrary, used only getting a lower bound). Then the contribution to the likelihood function for each point will be at least

\$log(0.5 times f(x_i|bar x, hat sigma) )\$

i.e. the contribution for each observation can be bounded from below for all values of \$mu_2\$ and \$sigma_2\$. Then consider if we set \$mu_2 = x_1\$. As \$sigma_2 rightarrow 0\$, \$f(x_1 | x_1, sigma_2) rightarrow infty\$. Of course, this implies

\$log(0.5 times f(x_1| x_1, sigma_2) + 0.5 times f(x_1 | bar x, hat sigma)) rightarrow infty\$

as well. Since the contributions of all the other observations are bounded from below by \$log(0.5 times f(x_i|bar x, hat sigma) )\$, this implies that the likelihood function is unbounded.

Since the likelihood function is unbounded, if the EM algorithm approaches this infinite "peak", it will not converge. However, this likelihood typically does have non-degenerate local maximum, assuming \$n\$ is sufficiently large relative to \$k\$ (number of components). So the EM algorithm may find these non-degenerate local maximum, which are undoubtably better estimators than the degenerate solution. In fact, if \$n\$ is large relative to \$k\$, it is very likely to find a non-degenerate local max rather than to approach the degenerate solution.

Rate this post