My intuition is that the fitted values and predicted values of a gbm object should be identical. But in this example with just one tree, the values are different:
b <- c(0,0,.8,0,0) x <- mvrnorm(100,mu=rep(0,5),diag(5)) colnames(x) <- paste0("x",1:5) y <- x %*% b + rnorm(10) gbm.fit.out <- gbm.fit(y=y,x=x,shrinkage=.1, n.trees=1,distribution="gaussian",verbose=F) d <- data.frame(y=y,x=x) gbm.out <- gbm(y~.,data=d,shrinkage=.1,n.trees=1,distribution="gaussian",trainFrac=1) p1 <- predict(gbm.fit,out,n.trees=1) p2 <- predict(gbm.out,n.trees=1) p1-p2
Why are they different? Does it even matter?
Contents
hide
Best Answer
This seems to be peculiar to gbm.fit. Using gbm (and being sure to turn off bagging, and splitting the sample into training and test set) produces correct results.
require(MASS); require(gbm) b <- c(0,0,.8,0,0) x <- mvrnorm(100,mu=rep(0,5),diag(5)) colnames(x) <- paste0("x",1:5) y <- x %*% b + rnorm(100) out <-gbm(y~x1+x2+x3+x4+x5,data=data.frame(y,x), shrinkage=1,n.trees=1, distribution="gaussian", verbose=F,bag.fraction=1,train.fraction=1) f <- out$fit p <- predict(out,n.trees=1) all(f-p == 0)
Similar Posts:
- Solved – Predicted values from gbm.fit and gbm differ
- Solved – how to plot 3D partial dependence in GBM
- Solved – how to plot 3D partial dependence in GBM
- Solved – Generating a correlated data matrix where both observations and variables are correlated
- Solved – GBM: Predict the response variable measured in {0,20}