I'm trying to learn about LDA and so i'm gathering information from different places. One thing which strikes me is on some occasions it's been explained that $pi_{k}f_{k}(X=x)$ is normally distributed and that there are assumptions on the distribution of the data. From what i've seen these sources don't really dive deep into the geometry of dimensionality reduction. In other cases it seems that LDA is more of just an exercise in dimensionality reduction and there is very little that goes into explaining the underlying assumptions of distributions of the data. What is the 'correct' approach? Does anyone have a source which explains both? Often i can only seem to find an explanation detailing one approach.

**Contents**hide

#### Best Answer

I can't tell you much about LDA applied in dimensionality reduction since I didn't realize it can be applied for such case until I just stumbled upon your question. But, I can provide a concise explication of LDA applied in binary classification.

Let's start with vital concepts of LDA, which are multivariate gaussian distributions and maximum likelihood estimations.

Suppose you observe normal distributions of two classes: *survived* and *died*. You encounter an individual with X known condition and assess the likelihood of survival or death. How might you estimate the probabilities?

Using known patient data, you create two separate normal distribution models, one for death and another one for survival, using the maximum likelihood estimation to estimate the variance and mean. So essentially, each model (or normal distribution) would be equipped with different variance and mean.

Finally, with the two models fitted on past patient data, you now input the X features of the new patient in each of the two models to get the probabilities of survival and death.

Based on how the model is constructed, you can see why in LDA you'd want to avoid applying the model on non-normally distributed data.