# Solved – Nonlinear Regression with linear method from Python’s scikit-learn/ sklearn using a polynom

I am trying to do a regression analysis for some data, say 20 variables
\$left( {{x_1},{x_2},{x_3},…} right)\$ where the underlying probability distribution is known (e. g. \${x_1} in {rm N}({mu _1},{sigma _1})\$ , \${x_2} in U({a_2},{b_2})\$ and so on).

The variables are assumed to be uncorrelated. The overall behavior of \$y = f({x_1},{x_2},{x_3},…)\$ is nonlinear.

Now, I would like to use a method of the scikit-learn module (e.g. Lasso, Lars, Ridge, Bayesian Regression etc.) for a metamodel-fit.

For taking into account the cross-influences and non-linear behaviour of some variables in y, I want to use a polynomial, i. e. I don't just give the vector \$overrightarrow x = left( {{x_1},{x_2},{x_3},…} right)\$ to the regression method, I rather feed it \$left( {{x_1},{x_2},{x_3},…,{x_1}*{x_1},{x_1}*{x_2},…} right)\$ which is a polynomial of degree two or more.

My question is: Which method (Lasso etc.) is best for this problem? How can I give the information to the regression method that the underlying distributions (mean and std deviation) and correlation of the higher order terms is known. For example \${{x_1}}\$ and \${{x_1}*{x_1}}\$ are highly correlated and both their distributions are known by \${mu },{sigma }\$. How can I add this information to the regression analysis? Otherwise without this additional information, I guess it will fit a poor model.

Any ideas?

Contents

You wrote you want to use sklearn anyway, did you take a look at the `sklearn.preprocessing.PolynomialFeatures` class? This should solve the first part of your problem.