Solved – Is it possible to penalize the \$k\$NN classification algorithm

I read through this paper on building an approach to Binary Classification in high-dimensional data. I wondered if there is a way to penalize the regular KNN classification algorithm?

Contents

I think KNN cannot be penalized. During testing, knn is supposed to find the closest example in the training set and return the corresponding label; it finds the k-nearest neighbors and returns the majority vote of their labels.

There are many approaches to selecting the best model:

1. We can choose the model which optimizes the fit to the training data minus a complexity penalty

$$H^* = arg max fit_H( H|D ) − λ complexity( H )$$

The complexity could be measured through parameter counting, for example.

1. We can estimate each model's performance based on the validation set. This would estimate of the generalization error. $$E[err] ≈ frac 1{Nvalid} sum_{n=1}^{Nvalid} {I(hat{y} space (x_n) neq y_n)}$$ Ex. split the given data into train/valid 80-20 split ratio + test set.

2. Using the k-fold cross validation if the data is small. If the split is small such that the $$N_{valid}$$ would be unreliable to estimate the generalized error, then we can repeatedly train on all-but-1/K and test on 1/K’th. Typically K=10. If $$K=N-1$$, this is called leave-one-out-CV.

Reference: CS340-Fall07_L4_knn_ubc_ca

Rate this post