# Solved – Optimal parameters with resampling in random forest

I'm building a classification model in R using random forest and the package caret. I'm interested in which parameters are optimised during resampling.

As an example, lets use the iris dataset and fit two models – one that uses no resampling, and one based on 10-fold cross validation:

``set.seed(99) mod1 <- train(Species ~., data = iris,           method = "rf",           ntree = 500,           tuneGrid = data.frame(mtry=2),           trControl = trainControl(method = "none"))   set.seed(99) mod2 <- train(Species ~., data = iris,           method = "rf",           ntree = 500,           tuneGrid = data.frame(mtry=2),           trControl = trainControl(method = "repeatedcv", number=10,repeats=1)) ``

As we can see, in both models the number of random predictors per split (mtry) is 2, and there are 500 trees generated. Obviously the two models give different results, but what are the parameters that are optimised during cross validaton?

As a comparison, Kuhn in his presentation talks about rpart (slides 52 – 69), where he explains that during resampling we actually prune the tree.

But what about when we're using random forest? Are the generated trees pruned as well, or there are other parameters that are optimised (e.g. max depth)?

Contents

You are not optimizing any parameters in your code. The only tuning parameter considered in the caret package is the `mtry` value, which is specified to be 2 in your code. However, it is still important to get a good estimate of the accuracy of the random forest; model 2 shows the accuracy is around 95.3% using repeated K-fold cross-validation. This is similar to what we get using the out-of-bag (OOB) sample estimate from the random forest:
``randomForest(Species ~ ., data=iris, ntree=500, mtry=2) ``