I am trying to analyse a dataset with at minimum 50 explanatory variables coded as 0 and 1 for presence/absence and a binary response variable (case/control). The goal is to see how the variables can predict the separation between case and control.
As there are more variables than observations I applied a partial least square discriminant analysis (PLS-DA) using the package mixOmics in R. However, when I want to test the significance of the analysis with PLSDA.test (package RVAideMemoire) I get a lot of warnings :
1: In pls(X, ind.mat, ncomp = ncomp, mode = "regression", ... : Zero- or near-zero variance predictors. Reset predictors matrix to not near-zero variance predictors. See $nzv for problematic predictors.
I guess the problem with near-zero variance results from the 0/1 coding of the predictor variables. I tried to convert the variables to factors, but this doesn't help. Is there a different analysis more suitable? How can I deal with presence/ absence variables as predictors?
My experience with this package leads me to believe that there is not enough variation in some of these variables. The program should tell you which ones. Perhaps all groups have mostly 1's on a variable.