I am using randomForest function with same seeds, but gives different results.
(with Boston dataset)
set.seed=500 regressor = randomForest(x = training_set, y = training_set$medv, ntree = 100) Call: randomForest(x = training_set, y = training_set$medv, ntree = 100) Type of random forest: regression Number of trees: 100 No. of variables tried at each split: 4 Mean of squared residuals: 0.03206772 % Var explained: 96.78
OR :
set.seed=500 regressor =randomForest(medv ~ . , data = training_set,ntree=100) Call: randomForest(formula = medv ~ ., data = training_set, ntree = 100) Type of random forest: regression Number of trees: 100 No. of variables tried at each split: 4 Mean of squared residuals: 0.1248719 % Var explained: 87.48
Gives different call results.
Any helps?
Thanks
Best Answer
set.seed=500
initializes a variable called set.seed
and sets it to 500. It does not set the random number generator seed.
Use set.seed(500)
instead.
You can look at the help page by ?set.seed
.
In addition, note that your first model (x = training_set
) includes all columns of the training data set – including the dependent variable medv
. In contrast, the second one (medv ~ .
) tells R to exclude the DV from the IVs. Of course these will give different results, since the training data are different.
Below, I give a reproducible example. The last model is an adaptation of your first model, and it indeed gives the same results as your second one.
library(randomForest) library(MASS) training_set <- Boston set.seed(500) regressor = randomForest(x = training_set, y = training_set$medv, ntree = 100) regressor set.seed(500) regressor =randomForest(medv ~ . , data = training_set,ntree=100) regressor set.seed(500) regressor = randomForest(x = training_set[,-14], y = training_set$medv, ntree = 100) regressor
Finally, note that you will typically get better help if you include a minimal reproducible example like the one I gave here.
Similar Posts:
- Solved – RandomForest ROC curve
- Solved – Random Forest % Var explained OOB output differs from test dataset results
- Solved – Testing variable importance in prediction
- Solved – Difference in randomForestSRC and randomForest package / increasing OOB-Error curve
- Solved – Caret and randomForest number of trees