Kernel function with different length scales, such as the squared exponential function, is said to be able to quantify the relative importance among the input (predictor) features. The idea is to compare each length scale for each feature, and if the length scale is large, the corresponding feature's influence is negligible.

My question is then,

How to compare the importance of features with different numerical scales? like dolor, time, length, etc. Is there any way to standardize them?

**Contents**hide

#### Best Answer

I don't think this is true, the length scale merely determines the idea of how close two data points are in a particular input dimension not how useful they are.

If you want to figure out an importance of each feature first look at your kernel and decompose it by removing the a dimension of x (ie remove a feature). Then determine the variance over you prediction space. Repeat for each dimension. How 'important' a feature is can be see as how much it reduces the variance on predictions.

Of course if your hyper parameters are such that you are over / under fitting the data then this is less useful and you can apply cross validation or leave one out (LOO) in order to gage feature importance.

### Similar Posts:

- Solved – Feature Importance using decision tree – categorical feature one-hot encoding or not
- Solved – How does linear SVMs function in multi dimensional feature space
- Solved – How does linear SVMs function in multi dimensional feature space
- Solved – How to use weight vector of SVM and logistic regression for feature importance
- Solved – Permutation feature importance on Train vs Validation set