Solved – Does correlation between variables and class label affect building a good classifier

If all variables that I tested have low correlation with the class variable, would it be possible to build a good classifier?

And if there was a high correlation between some variables and class label, is it a guarantee that we can build a good model?

A classifier would be stronger if the predictor variables were strongly correlated with the class, but there are a couple other considerations. First, if there are many predictor variables and they are not highly intercorrelated themselves, then it is possible to build a multivariate model (i.e., one that combines many predictors) that is much better than the typical predictor variable. I'd say that many Naive Bayes models fit this mold and often NB models work quite well.

Also, if your model will model interactions of variables (and you have sufficient sample to actually model these accurately) then you could (potentially) achieve much better results than would be expected on the basis of the correlations of the predictor variables.

And as for your second question, with most modeling methods you are at least guaranteed that you can predict better than the best individual predictor. But if your predictors are few or highly intercorrelated, your model may not be much better than the best individual predictor.

Also, it is possible to include too many low-quality predictors that cause the model to over-fit the training sample to the detriment of predictions in the testing samples (and in use). So, in that sense, a few good predictors do not "guarantee" a good model.

Similar Posts:

Rate this post

Leave a Comment