Solved – Is it possible to penalize the $k$NN classification algorithm

I read through this paper on building an approach to Binary Classification in high-dimensional data. I wondered if there is a way to penalize the regular KNN classification algorithm?

I was digging about this question, and I found the following:

I think KNN cannot be penalized. During testing, knn is supposed to find the closest example in the training set and return the corresponding label; it finds the k-nearest neighbors and returns the majority vote of their labels.

There are many approaches to selecting the best model:

  1. We can choose the model which optimizes the fit to the training data minus a complexity penalty

    $$H^* = arg max fit_H( H|D ) − λ complexity( H )$$

The complexity could be measured through parameter counting, for example.

  1. We can estimate each model's performance based on the validation set. This would estimate of the generalization error. $$E[err] ≈ frac 1{Nvalid} sum_{n=1}^{Nvalid} {I(hat{y} space (x_n) neq y_n)}$$ Ex. split the given data into train/valid 80-20 split ratio + test set.

  2. Using the k-fold cross validation if the data is small. If the split is small such that the $N_{valid}$ would be unreliable to estimate the generalized error, then we can repeatedly train on all-but-1/K and test on 1/K’th. Typically K=10. If $K=N-1$, this is called leave-one-out-CV.

Reference: CS340-Fall07_L4_knn_ubc_ca

Similar Posts:

Rate this post

Leave a Comment