Solved – Using KNN for prediction, how should I normalize the data

Is it better to constrain the data to a range, say [0,1], or to force a mean of 0 and sd of 1? Why? Does the type of input data matter (I'll be using both continuous and categorical variables)?

I think that depends on the data. If you know your feature is bounded, you could scale it to $[0,1]$. If it's binary I guess ${0,1}$ is a good choice, perhaps ${-1,1}$. Now, if it's unbounded, the standardization to $text Z$-scores $overline x = 0$, $sigma=1$ is a reasonable choice.

Similar Posts:

Rate this post

Leave a Comment