In scikit-learn the Ridge regression estimator has a normalize
parameter that normalizes the regressors. I found that it was necessary to set this to True
to get a reasonable fit to my data when using higher degree polynomial features (it provided consistent regularization no matter how many samples I trained/predicted on)
I would like to use a more robust estimator such as Huber regression, however it does not have this normalize
parameter so the fit is quite poor.
Sklearn has a preprocessing.Normalizer()
transformer which I tried adding into my pipeline, but it didn't help. When I instead created a preprocessing.FunctionTransformer()
that calls preprocessing.normalize()
, I found that if I set axis=0
(i.e. normalizing over the features rather than samples) I got a good fit much like when I had set normalize=True
for the Ridge estimator.
However, this only worked when I predicted on a sample of similar size to my training set. Depending on the number of inputs, the predicted values would change (this behavior does not occur with Ridge's normalize=True
)
I've been reading through the Ridge estimator's source code trying to find exactly how it implements its normalize
parameter, but it seems like a very convoluted solution.
Is there a relatively straightforward way that will properly normalize the regressors of a Huber estimator in the same way that the normalize
parameter does for the Ridge estimator?
Best Answer
IIUC, it seems like you've confused two different forms of normalization.
sklearn.preprocessing.Normalizer
normalizes vectors to unit norm. Note how it is naturally used to scale rows (instances), and not columns (features). Unit normalization is dependent on the vector length, in general. Concatenate a vector to itself, and you will need to reduce the elements further in order to retain unit length. For rows (instances), the length is constant.
sklearn.preprocessing.StandardScaler
, conversely, removes the mean, and scales to unit variance. This is naturally used to scale columns (features). It is basically independent of the vector size.
In your case, it seems like you should use StandardScaler
together with something like sklearn.linear_model.SGDRegressor
with (Huber loss) in a pipeline. You will need somehow to tune the l1 and and l2 parameters, preferably using some form of cross validation.
Similar Posts:
- Solved – What’s the pros and cons between Huber and Pseudo Huber Loss Functions
- Solved – How to implement the closed form solution of Ridge Regression in Python when intercept is not 0 (fit_intercept=True) without using sklearn
- Solved – Scikit-learn Normalization mode (L1 vs L2 & Max)
- Solved – Normalizing Features for use with KNN
- Solved – Efficiently normalize word embeddings