Solved – How to build and use the kernel trick manually in python

So… I have been trying to make a radial basis kernel for hours but I am not sure of what my final matrix should look like. I have 30 features and 200000 data points. Should my matrix K be 200000*200000 or 30*30 ?

My code so far produces 30*30:

def build_kernel(x, func, option): x = x.T K = np.zeros([x.shape[0], x.shape[0]]) for i in range(x.shape[0]):     xi = x[i,:]     for j in range (x.shape[0]):           xj = x[j,:]         K[i,j] = func(xi, xj, option)  return K   def radial_basis(xi, xj, gamma):     r = (np.exp(-gamma*(np.linalg.norm(xi-xj)**2)))      return r 

My goal is to use the kernel trick in ridge regression, like it is explained here:

But I have no idea how to implement this manually (I have to do it manually for school !)

Somebody knows how to do such a thing ? 🙂

Thanks !

The kernel function compares data points, so it would be $200,000 times 200,000$. (It seems that your data in x is stored as instances by features, but then you do x = x.T for some reason, swapping it. The matrix you've computed isn't anything meaningful as far as I know.)

That's going to be very challenging to work with on a normal personal computer; if you just removed the x = x.T line so that your code computed the proper thing, the matrix K would be 298 GB in memory! (Plus, the way you've implemented it with Python nested loops and 40 billion calls to the function radial_basis, it's going to take a long time to compute even if you do have that much memory.)

This is an example of a situation where directly using the kernel trick is, frankly, a bad idea.

If you're dead-set on doing kernel ridge regression, there are various approximations you can make to make it computationally reasonable on that size of data, and I can point you to some of them. But it seems unlikely that a school assignment would really require you to do that.

Similar Posts:

Rate this post

Leave a Comment