I read through this paper on building an approach to Binary Classification in high-dimensional data. I wondered if there is a way to penalize the regular KNN classification algorithm?
Best Answer
I was digging about this question, and I found the following:
I think KNN cannot be penalized. During testing, knn is supposed to find the closest example in the training set and return the corresponding label; it finds the k-nearest neighbors and returns the majority vote of their labels.
There are many approaches to selecting the best model:
We can choose the model which optimizes the fit to the training data minus a complexity penalty
$$H^* = arg max fit_H( H|D ) − λ complexity( H )$$
The complexity could be measured through parameter counting, for example.
We can estimate each model's performance based on the validation set. This would estimate of the generalization error. $$E[err] ≈ frac 1{Nvalid} sum_{n=1}^{Nvalid} {I(hat{y} space (x_n) neq y_n)}$$ Ex. split the given data into train/valid 80-20 split ratio + test set.
Using the k-fold cross validation if the data is small. If the split is small such that the $N_{valid}$ would be unreliable to estimate the generalized error, then we can repeatedly train on all-but-1/K and test on 1/K’th. Typically K=10. If $K=N-1$, this is called leave-one-out-CV.
Reference: CS340-Fall07_L4_knn_ubc_ca
Similar Posts:
- Solved – Can we say that k fold cross validation reduces bias and variance as compared to train-validation-test split procedure
- Solved – How to make train/test split with given class weights
- Solved – Checking the regression model’s performance
- Solved – Classification accuracy heavily depends on random seed
- Solved – Classification accuracy heavily depends on random seed