I am new to the whole ML scene and am trying to resolve the Allstate Kaggle challenge to get a better feeling for the Random Forest Regression technique.
The challenge is evaluated based on the MAE for each row.
I've run the sklearn
RandomForrestRegressor
on my validation set, using the criterion=mae
attribute. To my understanding this will run the Forest algorithm calculating the mae
instead of the mse
for each node.
After that I've used this: metrics.mean_absolute_error(Y_valid, m.predict(X_valid))
in order to calculate the MAE for each row of data.
What I would like to know is if the logic I'm following is sound. Am I making a fundemental mistake or missing something here?
Should I have used the default MSE based Regressor and then calculate the MAE of each row using the mean_absolute_error
function?
Best Answer
This is probably too late but…
The short answer is yes – your logic is sound. If you are going to be evaluated on MAE you want the Random Forest algorithm to use the same metric when it is building its trees (i.e. at each step it will look for the split that leads to the highest reduction in MAE).
Similar Posts:
- Solved – Random Forest % Var explained OOB output differs from test dataset results
- Solved – Out-of bag error in Random Forest
- Solved – the best source to learn Random-forest algorithm in Matlab from scratch
- Solved – Custom error function for randomForest R package
- Solved – Are decision forests and random forests the same thing