In scikit-learn the Ridge regression estimator has a `normalize`

parameter that normalizes the regressors. I found that it was necessary to set this to `True`

to get a reasonable fit to my data when using higher degree polynomial features (it provided consistent regularization no matter how many samples I trained/predicted on)

I would like to use a more robust estimator such as Huber regression, however it does not have this `normalize`

parameter so the fit is quite poor.

Sklearn has a `preprocessing.Normalizer()`

transformer which I tried adding into my pipeline, but it didn't help. When I instead created a `preprocessing.FunctionTransformer()`

that calls `preprocessing.normalize()`

, I found that if I set `axis=0`

(i.e. normalizing over the features rather than samples) I got a good fit much like when I had set `normalize=True`

for the Ridge estimator.

However, this only worked when I predicted on a sample of similar size to my training set. Depending on the number of inputs, the predicted values would change (this behavior does not occur with Ridge's `normalize=True`

)

I've been reading through the Ridge estimator's source code trying to find exactly how it implements its `normalize`

parameter, but it seems like a very convoluted solution.

Is there a relatively straightforward way that will properly normalize the regressors of a Huber estimator in the same way that the `normalize`

parameter does for the Ridge estimator?

**Contents**hide

#### Best Answer

IIUC, it seems like you've confused two different forms of normalization.

`sklearn.preprocessing.Normalizer`

normalizes vectors to unit norm. Note how it is naturally used to scale rows (instances), and not columns (features). Unit normalization is dependent on the vector length, in general. Concatenate a vector to itself, and you will need to reduce the elements further in order to retain unit length. For rows (instances), the length is constant.

`sklearn.preprocessing.StandardScaler`

, conversely, removes the mean, and scales to *unit variance*. This is naturally used to scale columns (features). It is basically independent of the vector size.

In your case, it seems like you should use `StandardScaler`

together with something like `sklearn.linear_model.SGDRegressor`

with (Huber loss) in a pipeline. You will need somehow to tune the l1 and and l2 parameters, preferably using some form of cross validation.

### Similar Posts:

- Solved – What’s the pros and cons between Huber and Pseudo Huber Loss Functions
- Solved – How to implement the closed form solution of Ridge Regression in Python when intercept is not 0 (fit_intercept=True) without using sklearn
- Solved – Scikit-learn Normalization mode (L1 vs L2 & Max)
- Solved – Normalizing Features for use with KNN
- Solved – Efficiently normalize word embeddings