I adjust the partial least squares regression for one categorical factor (2 levels – be
or nottobe
) with with the pls
package in R. I try to use round()
function in the predict values for take the decision if the result are the first or second level in my factor. Does this approach sound correct?
require(pls) #Artificial data T<-as.factor(sort(rep(c("be", "nottobe"), 100))) y1 <- c(rnorm(100,1,0.1),rnorm(100,1,0.1)) y2 <- c(rnorm(100,10,0.3),rnorm(100,10,0.6)) y3 <- c(rnorm(100,10,2.3),rnorm(100,11,2.6)) y4 <- c(rnorm(100,5,0.5),rnorm(100,7,0.5)) y5 <- c(rnorm(100,0,0.1),rnorm(100,0,0.1)) #Create the data frame avaliacao <- as.numeric(T) espectro <- cbind(y1,y2,y3,y4,y5) dados <- data.frame(avaliacao = I(as.matrix(avaliacao)), bands = I(as.matrix(espectro))) #PLS regression taumato <- plsr(avaliacao ~ bands, ncomp = 5, validation = "LOO", data=dados) summary(taumato) #Components analysis plot(taumato, plottype = "scores", comps = 1:5) #Cross validation taumato.cv <- crossval(taumato, segments = 10) plot(MSEP(taumato.cv), legendpos = "topright") summary(taumato.cv, what = "validation") plot(taumato, xlab ="medição", ylab="predição", ncomp = 3, asp = 1, main=" ", line = TRUE) #Predition for 3 components T<-as.factor(sort(rep(c("be", "nottobe"), 50))) y1 <- c(rnorm(100,1,0.1),rnorm(100,1,0.1)) y2 <- c(rnorm(100,10,0.3),rnorm(100,10,0.6)) y3 <- c(rnorm(100,10,2.3),rnorm(100,11,2.6)) y4 <- c(rnorm(100,5,0.5),rnorm(100,7,0.5)) y5 <- c(rnorm(100,0,0.1),rnorm(100,0,0.1)) espectro2 <- cbind(y1,y2,y3,y4,y5) new.dados <- data.frame(bands = I(as.matrix(espectro2))) round(predict(taumato, ncomp = 3, newdata = new.dados))##
Best Answer
PLS with a "hardening"-threshold to convert the output into hard class decisions is known as PLS-DA, and yes that is frequently done.
If you go for PLS-DA, you typically want to adjust the threshold for unequal numbers of training cases in the classes.
However, there are more advanced and possibly also more appropriate possibilities: you can use the PLS as regularization for "proper" classification models such as LDA (PLS-LDA) or logistic regression (PLS-LR; this is a type II nonlinear PLS model according to Rosipal's description).
Literature:
Similar Posts:
- Solved – R – Performing Multiple t-tests (one per row of dataframe) and Correcting for Multiple Testing
- Solved – How to look for a correlation between dependent variables in a repeated-measures/within-subjects design
- Solved – Posterior predictive regression lines in Bayesian linear regression seem to exclude most data points
- Solved – Negative value for Intercept, in lme() output, when all observations are positive
- Solved – how to plot 3D partial dependence in GBM