I am doing feature selection using the command 'rfe' in the caret package (http://caret.r-forge.r-project.org/featureselection.html). This command uses a metric to find the optimal amount of variables and which variables that is. However, I would like to also see the other steps in the feature selection than simply the last one. For instance, I would like to know which variables were the optimal ones if I wanted exactly 10 variables.
My code is the following:
ctrl <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE) subsets <- c(5,10,15,20,25) lmProfile <- rfe(dat2_X, dat2_Y, sizes = subsets, rfeControl = ctrl)
Best Answer
See lmProfile$variables
. It has the ranking metrics for each predictor at each iteration. For example, from ?rfe
:
data(BloodBrain) x <- scale(bbbDescr[,-nearZeroVar(bbbDescr)]) x <- x[, -findCorrelation(cor(x), .8)] x <- as.data.frame(x) set.seed(1) lmProfile <- rfe(x, logBBB, sizes = 10:20, rfeControl = rfeControl(functions = lmFuncs, number = 15))
head(lmProfile$variables)
has:
Overall var Variables Resample 4.930084 vsa_other 71 Resample01 4.696723 slogp_vsa5 71 Resample01 3.877510 pnsa1 71 Resample01 3.649555 vsa_base 71 Resample01 3.586327 frac.cation7. 71 Resample01 3.301325 a_base 71 Resample01
For each resample, there are 71 rows here that are the variables selected for a subset size of 71, 20 rows for the ones selected at 20 etc.
Max