tldr: Why would sklearn LinearRegression give a different result than gradient descent?

My understanding is that LinearRegression is computing the closed form solution for linear regression (described well here Why use gradient descent for linear regression, when a closed-form math solution is available?). LinearRegression is not good if the data set is large, in which case stochastic gradient descent needs to be used. I have a small data set and wanted to use Batch Gradient Descent (self written) as an intermediate step for my own edification.

**I get different regression weights using LinearRegression and Batch Gradient Descent. Shouldn't the solution be unique?**

**Code:**

`import numpy as np import pandas as pd from sklearn import linear_model import matplotlib.pyplot as plt data=pd.read_csv(r'') #Data set attached X=data[['Size','Floor','Broadband Rate']] y=data['Rental Price'] #Sklearn Linear Regression ols=linear_model.LinearRegression(fit_intercept=True, normalize=False) LR=ols.fit(X,y) Res_LR=y.values-LR.predict(X) #Residuals print('Intercept', LR.intercept_, 'Weights', LR.coef_) #Batch Gradient Descent def error_delta(x,y,p,wn): total=0 row,column=np.shape(x) for i in range(0,row): if wn!=0:total+=(y[i]-(p[0]+np.dot(p[1:len(p)],x[i,:])))*x[i,wn-1] else: total+=(y[i]-(p[0]+np.dot(p[1:len(p)],x[i,:])))*1 return total def weight_update(x,y,p,alpha): old=p new=np.zeros(len(p)) for i in range(0,len(p)): new[i]=old[i]+alpha*error_delta(x,y,old,i) return new weight=[-.146,.185,-.044,.119] #random starting conditions alpha=.00000002 #learning rate for i in range(0,500): #Seems to have converged by 100 weight=weight_update(X.values,y.values,weight,alpha) Res_BGD=np.zeros(len(X.values)) for i in range(0,len(X.values)): Res_BGD[i]=y.values[i]-(weight[0]+np.dot(weight[1::],X.values[i,:])) print('Inercept', weight[0], 'Weights', weight[1:len(weight)]) plt.plot(np.arange(0,len(X.values)),Res_LR,color='b') plt.plot(np.arange(0,len(X.values)), Res_BGD,color='g') plt.legend(['Res LR', 'Res BGD']) plt.show() `

The data set is below (10 points)

`Size,Floor,Broadband Rate,Energy Rating,Rental Price " 500 "," 4 "," 8 "," C "," 320 " " 550 "," 7 "," 50 "," A "," 380 " " 620 "," 9 "," 7 "," A "," 400 " " 630 "," 5 "," 24 "," B "," 390 " " 665 "," 8 "," 100 "," C "," 385 " " 700 "," 4 "," 8 "," B "," 410 " " 770 "," 10 "," 7 "," B "," 480 " " 880 "," 12 "," 50 "," A "," 600 " " 920 "," 14 "," 8 "," C "," 570 " " 1000 "," 9 "," 24 "," B "," 620 " `

When you plot the residuals, the performance is comparable despite very different weights

`SklearnLR Intercept 19.5615588974 Weights [ 0.54873985 4.96354677 -0.06209515] BGD Inercept -0.145402077197 Weights [ 0.62549182 -0.0344091 0.11473203] `

Thoughts? Also, if there's any programming feedback, I'm open to more efficient ways to code this as well.

**Contents**hide

#### Best Answer

There are some problems in your question.

- Can you make sure the iterative solver converge?

Note that, we can solve linear regression / minimizing squared loss in different ways. My experience of using python scikit-learn is the default set up usually will not give the result that converge. It is possible that we are limiting number of iterations in iterative solver, and stopped early. If we stop early, it is half done work, so it will not be as same as the optimal solution you got from other algorithms.

- I would not agree on

LinearRegression is not good if the data set is large, in which case stochastic gradient descent needs to be used.

If we are using QR decomposition, even data is on the level of millions (hopefully this is large enough), as well as number of features is not big, we can solve it in second. Check this R code. You may surprised that we can solve a linear regression on million data points with less than 1 sec.

`x=matrix(runif(2e6),ncol=2) y=runif(1e6) stime = proc.time() lm(y~x) print(proc.time()-stime) `

### Similar Posts:

- Solved – Ridge Regression with Gradient Descent Converges to OLS estimates
- Solved – How to train sklearn model to predict square of an integer
- Solved – Polynomial regression seems to give different coefficients depending on Python or R
- Solved – Difference in regression coefficients of sklearn’s LinearRegression and XGBRegressor
- Solved – Creating a dataframe includes the cross validation scores