Each positive definite kernel $k(x, x')$ used in machine learning/statistics has an equivalent representation as a dot product of the feature map representation $phi(x)$ of each input i.e.

begin{align}

k(x, x') = phi(x)^Tphi(x')

end{align}

My question is given a kernel expression, is it always possible to find the corresponding feature map? For example, we know that the corresponding feature map of gaussian kernel is an infinite dimensional vector. (Feature map for the Gaussian kernel)

Any pointers (including research papers) are welcome.

#### Best Answer

Short answer: It depends on what you mean by find and the precise kind of kernel you are looking at. In many cases you can prove the abstract existence of such a feature map but in practice it is always hard and generally impossible to "write it down". Furthermore, the constructions are mathematically subtle. You need to be careful about technical assumptions.

## Background

Let your kernel be defined as $K:OmegatimesOmegarightarrowmathbb{R}$ (the domain is important!). There are many feature maps in the sense that a feature map is an embedding of $Omega$ into a suitable Hilbert space. Of course, there is always the canonical feature map: $Phi:Omegarightarrowmathbb{R}^Omega, xmapsto K(x,cdot).$ Judging from the right hand side of your equation you are looking for a different feature map, one which maps into "vectors" i.e. $l^2$ which is the Hilbert space of square summable sequences with the canonical scalar product $<x,x>=sum_i x_i x_i$ aka "$x^Tx".$

## Mercer's Theorem

Key fact to obtain such a feature map is Mercer's theorem (see Theorem 4.49 in [1]). If your kernel $K$ is continuous and its domain of definition $Omega$ compact, then the map defined on square integrable functions $$ M_K: L^2(Omega) rightarrow L^2(Omega), fmapsto int_Omega f(t)K(t,cdot)dt$$ is a so called Hilbert-Schmidt operator. The theory of these operators tells us that there exists a countable family of functions $phi_i:Omegarightarrowmathbb{R}$ which spans $L^2(Omega)$ such that one can write the kernel $K$ as $$ K(x,y) = sum phi_i(x)phi_i(y),$$ which, of course, is exactly the feature map you are looking for.

## Further aspects

- To find the $phi_i$ explicitly you need to find all solutions to the integral equation $M_K(phi)=lambda phi$. This is very hard (or impossible) in general.
- Even this special kind of feature map is not unique. There will be other families $psi_i$ which also allow such a representation.
- The feature map depends not only on the Kernel $K$ but also on its domain $Omega$.

[1]: Ingo Steinwart; Andreas Christmann "Support Vector Machines"

### Similar Posts:

- Solved – Is it always possible to find the feature map from a given kernel
- Solved – Proof of Convergence in Probability
- Solved – Using a gaussian kernel in SVM. How exactly is this then written as a dot product
- Solved – Using a gaussian kernel in SVM. How exactly is this then written as a dot product
- Solved – How to intuitively explain what a kernel is