Solved – Rpart using Caret changes names of Factors

If I have a factor e.g. sexe with two levels MALE and FEMELLE let's say, using rpart alone I get splits that say for example Sexe = Male and then a yes no split. However using rpart with caret I get a weird renaming of variables:

enter image description here

this also causes a problem with the predict function as now my variable isn't called sexe anymore but sexeMALE. Is there a way around this? Also it's a factor variable what does >=.5 mean in this case?

Thanks

You probably used the formula method with train which converts the factors to dummy variables. Most functions in R that use the formula method do the same. rpart, randomForest, naiveBayes and a few others do not since they are able to model the categories without needing numeric encodings of that data.

The naming that you see is what is generated by model.matrix.

If you want to keep the factors as factors, use the non-formula method, e.g.

train(x, y) 

Max

Similar Posts:

Rate this post

Leave a Comment