Solved – What happens if I use OLS in a multiple regression but the sample is not random

I know that, to use OLS estimators in linear regressions, there are few assumption to be satisfied. However, it is not clear to me what would happen if I would use OLS in a multiple regression without having a random sample, so that (Xi, Yi) would not be iid. Which sort of problem may I face?

First, OLS is nothing more than an algorithm for fitting a linear model of the form $$ y = mathbf{Xbeta} + epsilon $$ In other words, you are positing that the phenomenon $y$ is a linear function of the variables $mathbf{X}$, plus some additively separable disturbance term.

If this is a good assumption, then there is some true, constant $mathbf{beta}$, and you apply some estimator — such as OLS — to estimate what it is.

If your sample is non-random — there is some correlation between your $mathbf{X}$'s and your error term — then OLS estimates of $mathbf{hatbeta}$ will not be equal in expectation to the true $mathbf{beta}$. This is to say that they are biased.

In other words, if you were to take many many samples from the population of $mathbf{X}$ and $y$, your average $mathbf{hatbeta}$ would not equal $beta$.

Similar Posts:

Rate this post

Leave a Comment