I am reading about Gaussian mixture models from this slide
https://www.ics.uci.edu/~smyth/courses/cs274/notes/EMnotes.pdf
However, I am super confused at the very first line.
It says:
We have a dataset of some data $x_i$
Each data is assumed to be generated i.i.d. from an underlying
distribution. We assume that the underlying distribution is a mixture
of Gaussian distribution.
I do not understand why we make the assumption that the underlying distribution for the data is the mixture of Gaussian distribution.
This seems to me to be completely false.
The data distribution could be anything. We are only fitting a mixture of Gaussian model to whatever that underlying distribution is. We are minimizing the log-likehood using EM to approximate that distribution with the GMM.
Why do people assume that the data themselves are generated through Gaussians?
Is my interpretation correct?
Best Answer
Actually, the GMM assumes the underlying data is generated from Mixture of Gaussians. You are thereby automatically in the position of assuming the Mixture Gaussianity of data by accepting and using the model. You're actually believing that the GMM will approximately able to represent your data well enough. In almost every algorithm, there are certain assumptions that you accept/assume, e.g. Naive Bayes assumes independence between features. Remember that almost all models are wrong.
Similar Posts:
- Solved – Is the posterior distribution on means in a Bayesian Gaussian mixture model with symmetric priors Gaussian
- Solved – How to make a GMM from a Histogram to give a probability
- Solved – Implementing Gaussian mixture model for a HMM library
- Solved – “mixture” in a gaussian mixture model
- Solved – “mixture” in a gaussian mixture model