I have a high dimensional dataset ($n times p$: $30 times 100$) which I want to use as an testing dataset to build a two group classifier (LDA or QDA). I've read that you can do PCA to do an dimension reduction of your dataset to select to most important features. But I'm a bit confused what you use exactly as the input to build the classifier. I'm familiar with PCA using SVD and what it means.

Consider following situation:

- I do a SVD of my dataset.
- I look at the scores of the first couple of principal components.
- When I assign my scores a label indicating from which group they come, I see that the 3th PC best separates my 2 groups (although it only explains 7% of the total variance).

What do I do next?

- I take the 3th PC transform to the original parameter space (scores * loadings * scale + mean) and build my classifier
- I look at the loadings in the 3th PC and try to decide which parameters in my original parameter space are important and build a classifier only using these.
- …

Option 2 seems the most sensible in my opinion but I'm not entirely sure.

Also If I see that only the 3th PC is important to explain the variance in my two groups, can I forget about the first two PC in my further analysis?

**Contents**hide

#### Best Answer

If you're going to do LDA after PCA I would keep the first k components. Don't try and figure out which of the components are important and don't go back to the original parameter space. You can feed in the k dimensional data into your LDA classifier and let it figure out what is important there.

### Similar Posts:

- Solved – How to Compute the Reconstruction error in Principal Component Analysis at lower dimensions
- Solved – How to eigenfaces (PCA eigenvectors on face image data) be displayed as images
- Solved – How to interpret PCA coefficients to reduce dimension
- Solved – How to interpret PCA coefficients to reduce dimension
- Solved – Transform data to a higher dimensional space