Since some weeks, I pursue the question "Under which circumstances will LDA achieve a higher classification accuracy than QDA using the same training and test set as well as the same prior probabilities?".
I have found these links to this issue:
link1
link2
link3
link4
link5
Unfortunately, the answers do not satisfy me completely.
Therefore, I would like to ask this question again in the hope of comprehensive responses consisting of descriptive examples.
Best Answer
Consider a case where the data generating process (DGP) corresponds to LDA or can be closely approximated by it. LDA should beat any competitors in large samples and it should beat more complex competitors (such as QDA) in small samples, too. The added flexibility of QDA would result in higher estimation variance due to the need for estimating some additional parameters. In relatively small samples, this cost will outweigh the lower model bias due to QDA offering a better approximation to the DGP than LDA. James et al. "An Introduction to Statistical Learning" (2013) p. 152-153 offer three examples where LDA beats QDA:
Scenario 1: There were 20 training observations in each of two classes. The observations within each class were uncorrelated random normal variables with a different mean in each class. The left-hand panel of Figure 4.10 shows that LDA performed well in this setting, as one would expect since this is the model assumed by LDA. KNN performed poorly because it paid a price in terms of variance that was not offset by a reduction in bias. QDA also performed worse than LDA, since it fit a more flexible classifier than necessary. Since logistic regression assumes a linear decision boundary, its results were only slightly inferior to those of LDA.
Scenario 2: Details are as in Scenario 1, except that within each class, the two predictors had a correlation of $−0.5$. The center panel of Figure 4.10 indicates little change in the relative performances of the methods as compared to the previous scenario.
Scenario 3: We generated $X_1$ and $X_2$ from the $t$-distribution, with 50 observations per class. The $t$-distribution has a similar shape to the normal distribution, but it has a tendency to yield more extreme points — that is, more points that are far from the mean. In this setting, the decision boundary was still linear, and so fit into the logistic regression framework. The set-up violated the assumptions of LDA, since the observations were not drawn from a normal distribution. The right-hand panel of Figure 4.10 shows that logistic regression outperformed LDA, though both methods were superior to the other approaches. In particular, the QDA results deteriorated considerably as a consequence of non-normality.
(Emphasis is mine.)
Similar Posts:
- Solved – How to have a “None of the above” category in a Logistic Regression
- Solved – LOGISTICS REGRESSION FOR PREDICTING NON-BINARY OUTCOME WITH TIME SERIES I THINK
- Solved – LOGISTICS REGRESSION FOR PREDICTING NON-BINARY OUTCOME WITH TIME SERIES I THINK
- Solved – How to set up first differences model
- Solved – Logistic regression estimator always outputs the same class in scikit-learn