I have a big data matrix with 6000 rows (observations) and 45 columns (44 predictive variables (categorical and continuous) and 1 response variable (0 or 1). I want to check the correlation/ multicollinearity in R. I have looked into cor()
and heat map so far, but it seems like for a big data I need to use something else. Please advice.
Contents
hide
Best Answer
I also like VIF's, but another way would be to estimate the mutual information between/among the various predictors as it isn't concerned solely with a linear relationship. The idea is to only use those covariates with low mutual information as they are telling you something different. Check out the infotheo
or entropy
pkgs.
Similar Posts:
- Solved – Variable Selection Method for Multinominal Model
- Solved – Why do we need to take the transpose of the data for PCA
- Solved – Does correlation imply mutual information
- Solved – What are the advantages of using mutual information over Pearson correlation in network inference
- Solved – How to calculate mutual information between a feature and target variable