Solved – Find variables selected for each subset using caret feature selection

I am doing feature selection using the command 'rfe' in the caret package (http://caret.r-forge.r-project.org/featureselection.html). This command uses a metric to find the optimal amount of variables and which variables that is. However, I would like to also see the other steps in the feature selection than simply the last one. For instance, I would like to know which variables were the optimal ones if I wanted exactly 10 variables.

My code is the following:

ctrl <- rfeControl(functions = rfFuncs,                    method = "cv",                    verbose = FALSE) subsets <- c(5,10,15,20,25) lmProfile <- rfe(dat2_X, dat2_Y,                  sizes = subsets,                  rfeControl = ctrl) 

See lmProfile$variables. It has the ranking metrics for each predictor at each iteration. For example, from ?rfe:

data(BloodBrain)  x <- scale(bbbDescr[,-nearZeroVar(bbbDescr)]) x <- x[, -findCorrelation(cor(x), .8)] x <- as.data.frame(x)  set.seed(1) lmProfile <- rfe(x, logBBB,                  sizes = 10:20,                  rfeControl = rfeControl(functions = lmFuncs,                                           number = 15)) 

head(lmProfile$variables) has:

Overall            var Variables   Resample 4.930084     vsa_other        71 Resample01 4.696723    slogp_vsa5        71 Resample01 3.877510         pnsa1        71 Resample01 3.649555      vsa_base        71 Resample01 3.586327 frac.cation7.        71 Resample01 3.301325        a_base        71 Resample01 

For each resample, there are 71 rows here that are the variables selected for a subset size of 71, 20 rows for the ones selected at 20 etc.

Max

Similar Posts:

Rate this post

Leave a Comment