Is it better to constrain the data to a range, say [0,1], or to force a mean of 0 and sd of 1? Why? Does the type of input data matter (I'll be using both continuous and categorical variables)?
Contents
hide
Best Answer
I think that depends on the data. If you know your feature is bounded, you could scale it to $[0,1]$. If it's binary I guess ${0,1}$ is a good choice, perhaps ${-1,1}$. Now, if it's unbounded, the standardization to $text Z$-scores $overline x = 0$, $sigma=1$ is a reasonable choice.
Similar Posts:
- Solved – Can random effects apply only to categorical variables
- Solved – Regression technique for data comprised of categorical explanatory variables & a continuous response variable
- Solved – When would one pre-specify thresholds in SEM/CFA for limited dependent variables
- Solved – SEM: Is there a way to constrain the standardized path coefficients to be equal in Mplus
- Solved – Best machine learning algorithm for loans dataset