I have created 3 different models and output of them is a class probability in binary classification problem. Models are bit different, showing importance from different features. I have of course one data matrix as a source for this exercise where 70% of data is used as training sample.

How one can summarize importance of different feature values to the final class prob estimate if only data matrix and list of features used is know besides this class probability estimate?

Individual models can be of course explained by different methods, but how one can explain avg ensemble predictions?

EDIT:

I have an data matrix containing all features and their values from different models plus of course combined ensemble probability estimate. How can one summarize how globally different features affect ensemble prob?

EDIT 2:

Can feature importances from different models combined somehow if different models use different features and variable value codings might be different?

**Contents**hide

#### Best Answer

Here are three intuitive ways to solve the problem:

- First normalize the feature importance of the features for each model to belong to 0-1 and then average the normalized feature importance values across the three models.
- Do the same as above, but instead of averaging perform weighted averaging of the feature importance. The weights in this case can be the performance of the models on your hold-out set. That way, you put more weight on your better performing models.
- In case you are interested in just ranking the features and you are not interested in their relative importance you can rank the features for each model and then average (or even weight-average) the corresponding ranks. For instance, the most important features has rank 1, the second most important feature rank 2 etc.. You do this across the three models and then you average the ranks. Of course, lower values suggest higher feature importance.

### Similar Posts:

- Solved – How to get feature importance for Gaussian Naive Bayes classifier
- Solved – In a random forest algorithm, how can one intrepret the importance of each feature
- Solved – How to use KL-divergence to weight features
- Solved – How to use KL-divergence to weight features
- Solved – Why do Random forest and XGBoost gives different importance weight on the same set of features