# Solved – What are the uses and pitfalls of regression through the origin?

Spuriously high R-squared is one of the pitfalls of regression through the origin (i.e. zero-intercept models). If the predictors do not contain zeroes, then is it an extrapolation? What are the uses and other pitfalls of regression through the origin? Are there any peer-reviewed articles?

Contents

To me the main issue boils down to imposing a strong constraint on an unknown process.

Consider a specification \$y_t=f(x_t)+varepsilon_t\$. If you don't know the exact form of a function \$f(.)\$, you could try a linear approximation: \$\$f(z)approx a+b x_t\$\$

Notice, how this linear approximation is actually the first order Maclaurin (Taylor) series of the function \$f(.)\$ around \$x_t=0\$: \$\$f(0)=a\$\$ \$\$frac{partial f(z)}{partial z}=b\$\$

Hence, when you regress through origin, from Maclaurin series view, you're saying that \$f(0)=0\$. This is a very strong constraint on a model.

There are situations where imposing such a constraint makes a sense, and these are driven by theory or outside knowledge. I would argue that unless you have a reason to believe that \$f(0)=0\$ it's not a good idea to regress through origin. As with any constraint, this will lead to suboptimal parameter estimation.

EXAMPLE: CAPM in finance. Here we state that the excess return \$r-r_f\$ on a stock is defined by its beta on the excess market return \$r_m-r_f\$: \$\$r-r_f=beta (r_m-r_f)\$\$

The theory tells us that the regression should be through origin. Now, some practitioners believe that they can get an additional return, alpha, on top of CAPM relationship: \$\$r-r_f=alpha+beta (r_m-r_f)\$\$

Both regressions are used in academic research and practice for different reasons. This example shows you when the imposition of a strong constraint, such as regression through origin, can make a sense in some situations.

Rate this post