Sometimes I encode categorical features as binary values – one feature per possible category value indicating whether that feature name matches the original category value (i.e. one-of-K scheme).
Now these values are linearly dependent, since obviously their total sum is 1.
Does this linear dependence matter for linear SVM, kernel SVM, logistic regression, etc.?
Where does it matter so that I need to remove one of the features? Does it cause problems for normal linear regression?
For which methods does it not make a difference?
Based on my understanding, collinearity will impact the estimation of the weights. It leads to multiple solutions. So if your goal is to see the weights of the features and calculate the significance, you probably have to remove one dummy value and use only K-1 values. In this case, the intercept is the weight of the dummy values that you removed. Or you can use K values without intercept.
But if your goal is to have a prediction model with high prediction performance (i.e., what you really care is the outcome), it does not matter which encoding schemes you use. If using K values, you can either disable the intercept term or add a regularization term to eliminate the impact of the collinearity.
- Solved – How to use weight vector of SVM and logistic regression for feature importance
- Solved – How LightGBM deal with a new categorical value in the test set
- Solved – Feature Selection Before or after Encoding
- Solved – Label encoding vs Dumthe variable/one hot encoding – correctness
- Solved – Ridge Regression: When should the intercept be included ? What is the purpose of the intercept term?