Solved – randomForest same formula different results (with same seed!)

I am using randomForest function with same seeds, but gives different results.
(with Boston dataset)

set.seed=500 regressor = randomForest(x = training_set,                           y = training_set$medv,                           ntree = 100)  Call:  randomForest(x = training_set, y = training_set$medv, ntree = 100)                 Type of random forest: regression                      Number of trees: 100 No. of variables tried at each split: 4       Mean of squared residuals: 0.03206772                 % Var explained: 96.78 

OR :

set.seed=500   regressor =randomForest(medv ~ . , data = training_set,ntree=100)   Call:  randomForest(formula = medv ~ ., data = training_set, ntree = 100)                 Type of random forest: regression                      Number of trees: 100 No. of variables tried at each split: 4       Mean of squared residuals: 0.1248719                 % Var explained: 87.48 

Gives different call results.
Any helps?

Thanks

set.seed=500 initializes a variable called set.seed and sets it to 500. It does not set the random number generator seed.

Use set.seed(500) instead.

You can look at the help page by ?set.seed.


In addition, note that your first model (x = training_set) includes all columns of the training data set – including the dependent variable medv. In contrast, the second one (medv ~ .) tells R to exclude the DV from the IVs. Of course these will give different results, since the training data are different.

Below, I give a reproducible example. The last model is an adaptation of your first model, and it indeed gives the same results as your second one.

library(randomForest) library(MASS) training_set <- Boston  set.seed(500) regressor = randomForest(x = training_set,                           y = training_set$medv,                           ntree = 100) regressor  set.seed(500) regressor =randomForest(medv ~ . , data = training_set,ntree=100)  regressor  set.seed(500) regressor = randomForest(x = training_set[,-14],                           y = training_set$medv,                           ntree = 100) regressor 

Finally, note that you will typically get better help if you include a minimal reproducible example like the one I gave here.

Similar Posts:

Rate this post

Leave a Comment