I am trying to implement a Watson Nadaraya classifier. There is one thing I didn't understand from the equation:
$${F}(x)=frac{sum_{i=1}^n K_h(x-X_i) Y_i}{sum_{i=1}^nK_h(x-X_i)}$$
What should I use for the kernel K?
I have a 2-dimensional dataset which has 1000 samples (each sample is like this: [-0.10984628, 5.53485135]
).
What confuses me is, based on my data, the input of the kernel function will be something like this:
K([-0.62978309, 0.10464536])
And what I understand, it'll produce some number instead of an array, therefore I can go ahead and calculate F(x) which will also be a number. Then I'll check whether it is > or <= than zero. But I couldn't find any kernel that produces a number. So confused.
Edit: I tried to implement my classifier based on the comments, but I got a very low accuracy. I appreciate if someone notices what's wrong with it.
def gauss(x): return (1.0 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * x**2) def transform(X, h): A = [] for i in X: A.append(stats.norm.pdf(i[0],0,h)*stats.norm.pdf(i[1],0,h)) return A N = 100 # pre-assign some mean and variance mean1 = (0,9) mean2 = (0,5) cov = [[0.3,0.7],[0.7,0.3]] # generate a dataset dataset1 = np.random.multivariate_normal(mean1,cov,N) dataset2 = np.random.multivariate_normal(mean2,cov,N) X = np.vstack((dataset1, dataset2)) # pre-assign labels Y1 = [1]*N Y2 = [-1]*N Y = Y1 + Y2 # assing a width h = 0.5 #now, transform the data X2 = transform(X, h) j = 0 predicted = [] for i in X2: # apply the equation fx = sum((gauss(i-X2))*Y)/float(np.sum(gauss(i-X2))) # if fx>0, it belongs to class 1 if fx >0: predicted.append(1) else: predicted.append(-1) j = j+1
Best Answer
You could take $K_h$ to be the density function for a bi-variate Gaussian distribution, with mean $x$, covariance matrix $hI$, and evaluated at $X_i$…
Similar Posts:
- Solved – Kernel PCA and classification
- Solved – How does the shape of a decision boundary in relate between the original and kernel feature space
- Solved – How to transform this dataset to make classes linearly separable
- Solved – How feature stacking works
- Solved – Kernel Density estimation function and bandwidth selection