I'm in the beginnings of following along with the Coursera machine learning course, and I just did univariate linear regression. My regression line/output looks good and the cost function decreased, but was still extremely high at the end of iterating (J(theta) = 2058715091.21221 at the final iteration). Is this an issue if everything looks right and it seems to asymptote around there or should it really be going to zero? Here are the plots:

If there's not really a general answer and it depends upon specifics I'll make an edit and post all the code. For a general overview, the data is (house sqft, house price) with ranges (852-4478),(179900-699900). I normalized the house sqft input between [0,1], set the learning rate to 1 and the number of iterations to 100. I tried with a smaller learning rate and higher iterations and it doesn't seem to help.

Any comments or suggestions are greatly appreciated, thanks.

#### Best Answer

Directly examining the cost function can be useful, but be aware of some basic issues:

- Units: Eg. if you measured house price in a less valuable currency (eg. Yen) all the numbers would be higher. What you regard as "high" must be relative to the units used.
- Number of observations: you want to normalize by the number of observations so that more data doesn't mechanically give you higher cost!

## Some basic measures of overall error:

Root mean square error is a monotonic transformation of the sum of squares, so minimizing the sum of squares is the same thing as minimizing root mean square error (and minimizing mean absolute deviation is the same as minimizing the sum of absolute error).

$R^2 = 1 – frac{SS_{err}}{SS_{tot}}$ is 1 minus the sum of squared error divided by the total sum of squares. For a linear regression with a constant, this essentially gives you the proportion of the variance explained by the model.

## What does a high root mean square error, high mean absolute deviation mean, or low $R^2$ imply?

In some sense it means that you have a lot of forecast error. What's reasonable to expect in terms of forecast error is *entirely problem dependent*. In physics with good data and precisely modeled problems, you may have almost no error. In economics (eg. forecasting home prices etc…) you tend have a *LOT* of error. In fact, if you have implausibly good forecasts, too *little* error, it probably means you've overfit the data!

## Beware of overfitting…

In general, a huge problem in empirical research, machine learning etc.. is overfitting. If you give yourself enough parameters to estimate, you can end up with a model that fits the training data *too* well… it fits your sample, but if you try it on new data, the model may perform horribly. If there's overfitting, your algorithm is picking up random, meaningless peculiarities of *your particular data set*.

Note there's a big conceptual difference between error on: (1) the data used to estimate your model and (2) new data.

## General note on solving least squares

The solution to minimizing a sum of squares can be expressed as a solution to a linear system of equations. (See derivation here: Understanding linear algebra in Ordinary Least Squares derivation.) Systems of linear equations can be efficiently solved and you can check the accuracy of your gradient descent algorithm by simply comparing it to the solution you get by solving the linear system.

Eg.

`b_gradient_descent = my_gradient_descent(y, X); b_linear_system = linsolve(X'*X, X'*y); `

or in Matlab, the `b_linear_system = X y;`

### Similar Posts:

- Solved – Finding a linear regression model that minimized percentage error in R
- Solved – RMSE – where this evaluation metric came from
- Solved – Maximum likelihood estimator compared to least squares
- Solved – Minimizing the median absolute deviation or median absolute error
- Solved – the point of Root Mean Absolute Error, RMAE, when evaluating forecasting errors