I recently learned about the use of the Kernel trick, which maps data into higher dimensional spaces in an attempt to linearize the data in those dimensions. Are there any cases where I should avoid using this technique? Is it just a matter of finding the right kernel function?
For linear data this is of course not helpful, but for non-linear data, this seems always useful. Using linear classifiers is much easier than non-linear in terms of training time and scalability.
Best Answer
For linear data this is of course not helpful, but for non-linear data, this seems always useful. Using linear classifiers is much easier than non-linear in terms of training time and scalability.
@BartoszKP already explained why is kernel trick useful. To fully address your question however I would like to point out, that kernelization is not the only option to deal with non linearly separable data.
There are at least three good, common alternatives for delinearization of the model:
- Neutal network based methods, where you add one (or more) layers of processing units, able to transform your data into the linearly separable case. In the simplest case it is a sigmoid-based layer, which adds non-linearity to the process. Once randomly initialized they are getting updates during the gradient-based optimization of the upper layer (which actualy solves the linear problem).
- In particular – deep learning techniques can be used here to prepare data for further linear classification. It is very similar idea to the previous one, but here you first train your processing layers in order to find a good starting point for further fine-tuning based on training of some linear model.
- Random projections – you can sample (non linear) projections from some predefined space and train linear classifier on top of them. This idea is heavily exploited in so called extreme machine learning, where very efficient linear solvers are used to train a simple classifier on random projections, and achieve very good performance (on non linear problems in both classification and regression, check out for example extreme learning machines).
To sum up – kernelization is a great delinearization technique, and you can use it, when the problem is not linear, but this should not be blind "if then" appraoch. This is just one of at least few interesting methods, which can lead to various results, depending on the problem and requirements. In particular, ELM tends to find very similar solutions to those given by kernelized SVM while in the same time can be trained rows of magnitude faster (so it scales up much better than kernelized SVMs).
Similar Posts:
- Solved – Should I use the Kernel Trick whenever possible for non-linear data
- Solved – Should I use the Kernel Trick whenever possible for non-linear data
- Solved – Linear kernel and non-linear kernel for support vector machine
- Solved – SVM regression with LIBSVM and python – Expected runtime
- Solved – Residual network dimension changing blocks identity function