From *Artificial Intelligence: A modern approach*:

Most nonparametric models have the advantage that it is easy to do leave-one-out crossvalidation

without having to recompute everything. With a k-nearest-neighbors model, for

instance, when given a test example (x, y) we retrieve the k nearest neighbors once, compute

the per-example loss L(y, h(x)) from them, and record that as the leave-one-out result for

every example that is not one of the neighbors. Then we retrieve the k + 1 nearest neighbors

and record distinct results for leaving out each of the k neighbors. With N examples the

whole process is O(k), not O(kN).

Why would we

*record that as the leave-one-out result for*? Since we only retrieved one test example (x, y), shouldn't we only record the result for that example?

every example that is not one of the neighborsWhy would we

*retrieve the k + 1 nearest neighbors*afterwards? Isn't our goal to verify a k-nearest neighbors model?Why is the whole process

*O(k), not O(kN)*? Doesn't LOOCV mean that we validate*every*single one of the N examples, so our complexity can't really drop below O(N)?

**Contents**hide

#### Best Answer

I think the key point is:

Doesn't LOOCV mean that we validate every single one of the N examples

yes

so our complexity can't really drop below O(N)?

no, this doesn't necessarily follow. Completely recalculate N times is the *worst* case which can always be achieved.

For some classifiers (and depending on the actual classifier under consideration) it is possible to regroup the calculations done in those N training procedures with n – 1 case each, so that certain intermediate results can be reused. The idea is to show that the *same result* you get with the elaborate N times recalculation is reached by a shorter algorithm.

For kNN, the savings are particularly large, so that you can get the LOO predictions almost for free. The key idea here is look which predictions are influenced by leaving out one particular case. If you do this, you'll realize that *most* cases aren't influenced.

I suggest to draw an example case and visualize step by step what your text says. This will answer your 1st and 2nd question.

BTW: without having thought about this in detail, I don't see that non-parametric is a good rule-of-thumb here. Yes, kNN is non-parametric, but it is also a local method (predictions depend only on cases closeby in sample space) – and this is what the suggested regrouping strategy uses. A model that uses median instead of mean OTOH is far less localized and *therfore* the presented strategy doesn't help. And while you can easily update the median for the left-out case, updating the mean isn't that much more complicated, neither.

### Similar Posts:

- Solved – Dealing with ties, weights and voting in kNN
- Solved – Difference of nearest-neighbour clustering and K-nearest neighbor clustering
- Solved – the meaning of the term “stable” in relation to predictions
- Solved – Is Tomek Link undersampling the same as Edited Nearest Neighbours with 1 neighbour
- Solved – How to plot visualization for multi-label k-Nearest Neighbor