I'm reading about Cart classification with rpart on R, and after all we should compute the misclassification error,
given that
y is the column that stocks classes,
and x is the variable columns
and fit=rpart(y~.,x)
How Can we interpret this value W=sum(Y==predict(fit,x,type="class"))/length(Y)
?
Best Answer
The last formula may not be accurate but it seems to be the proportion of fitted values where it is classified as a certain class.
Below is an example and the response is a binary variable (H or L). What the last formula seems to aim would be length(fit.val[fit.val=="H"])/length(df$y)
or length(fit.val[fit.val=="L"])/length(df$y)
.
Finally it is normally the confusion matrix that classification results are assessed. As shown in cm
, the diagonal elements are correct classification while off-diagonal elements are error whether it is false-positive or false-negative. Therefore mean misclassification error can be obtained by (1 – correct classification proportion) – 1 - (sum(diag(cm))/sum(cm))
library(rpart) set.seed(1237) df <- data.frame(y = sample(c("H","L"), 100, replace = T), x = rnorm(100)) fit <- rpart(y ~ x, data = df) # fitted values fit.val <- predict(fit, type = "class") # proportion that classified as H or L length(fit.val[fit.val=="H"])/length(df$y) # [1] 0.51 length(fit.val[fit.val=="L"])/length(df$y) # [1] 0.49 # confusion table cm <- table(actual = df$y, fitted = fit.val) cm # fitted # actual H L # H 36 11 # L 15 38 # mean misclassification error mmce <- 1 - (sum(diag(cm))/sum(cm)) mmce # [1] 0.26
Similar Posts:
- Solved – Use of regression-trees to determine probabilities for a binary variable
- Solved – How to control the cost of misclassification in Random Forests
- Solved – How to find TP,TN, FP and FN values from 8×8 Confusion Matrix
- Solved – How to use estimated probabilities of a class from rpart to identify the top N classes
- Solved – Validating the CART model in R