Solved – Mean Absolute Error in Random Forest Regression

I am new to the whole ML scene and am trying to resolve the Allstate Kaggle challenge to get a better feeling for the Random Forest Regression technique.

The challenge is evaluated based on the MAE for each row.

I've run the sklearn RandomForrestRegressor on my validation set, using the criterion=mae attribute. To my understanding this will run the Forest algorithm calculating the mae instead of the mse for each node.

After that I've used this: metrics.mean_absolute_error(Y_valid, m.predict(X_valid)) in order to calculate the MAE for each row of data.

What I would like to know is if the logic I'm following is sound. Am I making a fundemental mistake or missing something here?
Should I have used the default MSE based Regressor and then calculate the MAE of each row using the mean_absolute_error function?

This is probably too late but…

The short answer is yes – your logic is sound. If you are going to be evaluated on MAE you want the Random Forest algorithm to use the same metric when it is building its trees (i.e. at each step it will look for the split that leads to the highest reduction in MAE).

Similar Posts:

Rate this post

Leave a Comment