If I have a factor e.g. sexe with two levels MALE and FEMELLE let's say, using rpart alone I get splits that say for example Sexe = Male and then a yes no split. However using rpart with caret I get a weird renaming of variables:

this also causes a problem with the predict function as now my variable isn't called sexe anymore but sexeMALE. Is there a way around this? Also it's a factor variable what does >=.5 mean in this case?

Thanks

**Contents**hide

#### Best Answer

You probably used the formula method with `train`

which converts the factors to dummy variables. Most functions in R that use the formula method do the same. `rpart`

, `randomForest`

, `naiveBayes`

and a few others do not since they are able to model the categories without needing numeric encodings of that data.

The naming that you see is what is generated by `model.matrix`

.

If you want to keep the factors as factors, use the non-formula method, e.g.

`train(x, y) `

Max

### Similar Posts:

- Solved – Using Rpart to find which factor influence the outcome the most
- Solved – Decision trees in smaller datasets
- Solved – Decision trees in smaller datasets
- Solved – Predictions for rpart model require more variables than shown in the classification tree
- Solved – R Formula that only uses a subset of a factor