Solved – Why is optimizing a mixture of Gaussian directly computationally hard

Consider the log likelihood of a mixture of Gaussians:

$$l(S_n; theta) = sum^n_{t=1}log f(x^{(t)}|theta) = sum^n_{t=1}logleft{sum^k_{i=1}p_i f(x^{(t)}|mu^{(i)}, sigma^2_i)right}$$

I was wondering why it was computationally hard to maximize that equation directly? I was looking for either a clear solid intuition on why it should be obvious that its hard or maybe a more rigorous explanation of why its hard. Is this problem NP-complete or do we just not know how to solve it yet? Is this the reason we resort to use the EM (expectation-maximization) algorithm?


Notation:

$S_n$ = training data.

$x^{(t)}$ = data point.

$theta$ = the set of parameters specifying the Gaussian, theirs means, standard deviations and the probability of generating a point from each cluster/class/Gaussian.

$p_i$ = the probability of generating a point from cluster/class/Gaussian i.

First, GMM is a particular algorithm for clustering, where you try to find the optimal labelling of your $n$ observations. Having $k$ possible classes, it means that there are $k^n$ possible labellings of your training data. This becomes already huge for moderate values of $k$ and $n$.

Second, the functional you are trying to minimize is not convex, and together with the size of your problem, makes it very hard. I only know that k-means (GMM can be seen as a soft version of kmeans) is NP-hard. But I am not aware of whether it has been proved for GMM as well.

To see that the problem is not convex, consider the one dimensional case: $$ L = log left(e^{-({x}/{sigma_{1}})^2} + e^{-({x}/{sigma_{2}})^2}right) $$ and check that you cannot guarantee that $frac{d^2L}{dx^2} > 0$ for all x.

Having a non-convex problem means that you can get stuck in local minima. In general, you do not have the strong warranties you have in convex optimization, and searching for a solution is also much harder.

Similar Posts:

Rate this post

Leave a Comment