Solved – Caret feature selection with customized random forest classifier

I'm following the Caret package tutorial for constructing customized functions for a recursive feature elimination. I can reproduce the provided example which is a random forest regression. However, when I modify the code to deal with classification, I receive an odd error:

library(caret) library(mlbench) library(Hmisc) library(randomForest)  n <- 100 p <- 40 sigma <- 1 set.seed(1) sim <- mlbench.friedman1(n, sd = sigma) colnames(sim$x) <- c(paste("real", 1:5, sep = ""),                          paste("bogus", 1:5, sep = ""))     bogus <- matrix(rnorm(n * p), nrow = n)     colnames(bogus) <- paste("bogus", 5+(1:ncol(bogus)), sep = "")     x <- cbind(sim$x, bogus) y <- sim$y #customizing tutorial example for binary outcome  y[y <= 12] <- 0     y[y> 12] <- 1  y <- factor(y)   normalization <- preProcess(x) x <- predict(normalization, x) x <- as.data.frame(x) subsets <- c(1:5, 10, 15, 20, 25) rfRFE <-  list(summary = defaultSummary,                       fit = function(x, y, first, last, ...){              library(randomForest)              randomForest(x, y, importance = first, ...)              },            pred = function(object, x)  predict(object, x),            rank = function(object, x, y) {              vimp <- varImp(object)              vimp <- vimp[order(vimp$Overall,decreasing = TRUE),,drop = FALSE]                  vimp$var <- rownames(vimp)              vimp              },            selectSize = pickSizeBest,            selectVar = pickVars)  ctrl <- rfeControl(functions = lmFuncs,                    method = "repeatedcv",                    repeats = 5,                    verbose = FALSE) ctrl$functions <- rfRFE     ctrl$returnResamp <- "all" set.seed(10) rfProfile <- rfe(x, y, sizes = subsets, rfeControl = ctrl) rfProfile 

The error is:

Error in {: task 1 failed - "argument 1 is not a vector" 

My question is how should one go about defining rfRFE for random forest models with binary response variables?

You need to make sure the response variable is a factor with level names starting with a letter

Similar Posts:

Rate this post

Leave a Comment