Solved – randomForest vs randomForestSRC discrepancies

There are two popular R packages to build random forests introduced by Breiman (2001): randomForest and randomForestSRC. I am noticing small, yet significant discrepancies in terms of accuracy between the two packages, even when I try to use the same input parameters. I understand we would expect a slightly different random forest, but in example below, randomForestSRC package consistently outperforms the randomForest package. I'm guessing there are other examples where randomForest is superior. Can someone please explain why these packages provide different predictions? Is there a way to generate a random forest for both packages using the same methodology?

In the example, there's no missing data, all values are distinct, mtry=1, and trees are grown until nodesplit=5. I believe the same bootstrap approach and split rule is used too. Increasing ntree or number of observations in the simulated dataset does not change the relative difference between the two packages.

``library(randomForest) library(randomForestSRC)  set.seed(130948) #Other seeds give similar comparative results x1<-runif(1000) y<-rnorm(1000,mean=x1,sd=.3) data<-data.frame(x1=x1,y=y)  #Compare MSE using OOB samples based on output (modRF<-randomForest(y~x1,data=data,ntree=500,nodesize=5)) (modRFSRC<-rfsrc(y~x1,data=data,ntree=500,nodesize=5))  #Compare MSE using a test sample x1new<-runif(10000) ynew<-rnorm(10000,mean=x1new,sd=.3) newdata<-data.frame(x1=x1new,y=ynew)  mean((predict(modRF,newdata=newdata)-newdata\$y)^2) #MSE using randomForest     mean((predict(modRFSRC,newdata=newdata)\$predicted-newdata\$y)^2) #MSE using randomForestSRC ``
Contents