So… I have been trying to make a radial basis kernel for hours but I am not sure of what my final matrix should look like. I have 30 features and 200000 data points. Should my matrix K be 200000*200000 or 30*30 ?

My code so far produces 30*30:

`def build_kernel(x, func, option): x = x.T K = np.zeros([x.shape[0], x.shape[0]]) for i in range(x.shape[0]): xi = x[i,:] for j in range (x.shape[0]): xj = x[j,:] K[i,j] = func(xi, xj, option) return K def radial_basis(xi, xj, gamma): r = (np.exp(-gamma*(np.linalg.norm(xi-xj)**2))) return r `

My goal is to use the kernel trick in ridge regression, like it is explained here:

http://www.ics.uci.edu/~welling/classnotes/papers_class/Kernel-Ridge.pdf

But I have no idea how to implement this manually (I have to do it manually for school !)

Somebody knows how to do such a thing ? 🙂

Thanks !

**Contents**hide

#### Best Answer

The kernel function compares data points, so it would be $200,000 times 200,000$. (It seems that your data in `x`

is stored as instances by features, but then you do `x = x.T`

for some reason, swapping it. The matrix you've computed isn't anything meaningful as far as I know.)

That's going to be very challenging to work with on a normal personal computer; if you just removed the `x = x.T`

line so that your code computed the proper thing, the matrix `K`

would be 298 GB in memory! (Plus, the way you've implemented it with Python nested loops and 40 billion calls to the function `radial_basis`

, it's going to take a *long* time to compute even if you do have that much memory.)

This is an example of a situation where directly using the kernel trick is, frankly, a bad idea.

If you're dead-set on doing kernel ridge regression, there are various approximations you can make to make it computationally reasonable on that size of data, and I can point you to some of them. But it seems unlikely that a school assignment would really require you to do that.