I know that, to use OLS estimators in linear regressions, there are few assumption to be satisfied. However, it is not clear to me what would happen if I would use OLS in a multiple regression without having a random sample, so that (Xi, Yi) would not be iid. Which sort of problem may I face?
Best Answer
First, OLS is nothing more than an algorithm for fitting a linear model of the form $$ y = mathbf{Xbeta} + epsilon $$ In other words, you are positing that the phenomenon $y$ is a linear function of the variables $mathbf{X}$, plus some additively separable disturbance term.
If this is a good assumption, then there is some true, constant $mathbf{beta}$, and you apply some estimator — such as OLS — to estimate what it is.
If your sample is non-random — there is some correlation between your $mathbf{X}$'s and your error term — then OLS estimates of $mathbf{hatbeta}$ will not be equal in expectation to the true $mathbf{beta}$. This is to say that they are biased.
In other words, if you were to take many many samples from the population of $mathbf{X}$ and $y$, your average $mathbf{hatbeta}$ would not equal $beta$.
Similar Posts:
- Solved – Linear Regression for iid sample: The value of $E(epsilon_i^2|x_i)$ is not the same across i. Why
- Solved – Relationship between noise term ($epsilon$) and MLE solution for Linear Regression Models
- Solved – Lasso penalty only applied to subset of regressors
- Solved – Lasso for time series – Independence assumptions violated
- Solved – Sufficient Statistic for $beta$ in OLS