I almost always used Numpy's StandardScaler to normalize my data for machine learning. I noticed however that simply taking the log of the variables that I wanted to normalize often resulted in better accuracy compared to when I used the StandardScaler method.

To give some more context, I built several binary classifiers for different purposes both with ANNs and XGboost and I noticed that log-normalizing the data always leads to better accuracy.

I'm a little puzzled by this as nobody ever mentions log-normalization as a valid normalization technique. Everyone talks about min-max normalization and Z-score/Numpy's StandardScaler but no one even mentions log-normalization.

How is that possible? Am I doing something wrong?

**Contents**hide

#### Best Answer

It is quite often to use the log transformation on your data, if your data are always positive (e.g. the price of something) and their scales varies drastically.

A simple criterion of whether you should use log transformation is whether you want to use a linear or log scale for your x-axis when you are plotting the histogram of your data.

This is likely to make your ANN work better if your data indeed look that way, because of one reason: Remember the motivation of batch normalization – ANN likes to have a standard normal distribution. You can make your distribution zero-centered with unit variance, but that does not make your distribution into a normal distribution, but your distribution might look more like a normal distribution if you use log transformation. You can check whether this is true from the histogram or the Kurtosis of your distribution.

### Similar Posts:

- Solved – Difference between (log, square, root) transformation and Normalization
- Solved – Difference between (log, square, root) transformation and Normalization
- Solved – Constructing a model with SMOTE and sklearn pipeline
- Solved – Constructing a model with SMOTE and sklearn pipeline
- Solved – Scikit-learn: How to normalize Huber regressors