Assume the following linear relationship:

$Y_i = beta_0 + beta_1 X_i + u_i$, where $Y_i$ is the dependent variable, $X_i$ a single independent variable and $u_i$ the error term.

According to Stock & Watson (Introduction to Econometrics; Chapter 4), the *third least squares assumption* is that the fourth moments of $X_i$ and $u_i$ are non-zero and finite $(0<E(X_i^4)<infty text{ and } 0<E(u_i^4)<infty)$.

I have three question:

I do not fully understand the role of this assumption. Is OLS biased and inconsistent if this assumption does not hold or do we need this assumption for inference?

Stock and Watson write "this assumption limits the probability of drawing an observation with extremely large values of $X_i$ or $u_i$." However, my intuition is that this assumption is extreme. Are we in trouble if we have large outliers (such that the fourth moments are large) but if these values are still finite? By the way: What is the underlying definition an outlier?

Can we reformulate this as follows: "The kurtosis of $X_i$ and $u_i$ are nonzero and finite?"

#### Best Answer

You do *not* need assumptions on the 4th moments for consistency of the OLS estimator, but you do need assumptions on higher moments of $x$ and $epsilon$ for asymptotic normality and to consistently estimate what the asymptotic covariance matrix is.

In some sense though, that is a mathematical, technical point, not a practical point. For OLS to work well in finite samples in some sense requires more than the minimal assumptions necessary to achieve asymptotic consistency or normality as $n rightarrow infty$.

### Sufficient conditions for consistency:

If you have regression equation: $$ y_i = mathbf{x}_i' boldsymbol{beta} + epsilon_i $$

The OLS estimator $hat{mathbf{b}}$ can be written as: $$ hat{mathbf{b}} = boldsymbol{beta} + left( frac{X'X}{n}right)^{-1}left(frac{X'boldsymbol{epsilon}}{n} right)$$

For consistency, you need to be able to apply Kolmogorov's Law of Large Numbers or, in the case of time-series with serial dependence, something like the Ergodic Theorem of Karlin and Taylor so that:

$$ frac{1}{n} X'X xrightarrow{p} mathrm{E}[mathbf{x}_imathbf{x}_i'] quad quad quad frac{1}{n} X'boldsymbol{epsilon} xrightarrow{p} mathrm{E}left[mathbf{x}_i' epsilon_iright] $$

Other assumptions needed are:

- $mathrm{E}[mathbf{x}_imathbf{x}_i']$ is full rank and hence the matrix is invertible.
- Regressors are predetermined or strictly exogenous so that $mathrm{E}left[mathbf{x}_i epsilon_iright] = mathbf{0}$.

Then $left( frac{X'X}{n}right)^{-1}left(frac{X'boldsymbol{epsilon}}{n} right) xrightarrow{p} mathbf{0}$ and you get $hat{mathbf{b}} xrightarrow{p} boldsymbol{beta}$

If you want the central limit theorem to apply *then* you need assumptions on higher moments, for example, $mathrm{E}[mathbf{g}_imathbf{g}_i']$ where $mathbf{g_i} = mathbf{x}_i epsilon_i$. The central limit theorem is what gives you asymptotic normality of $hat{mathbf{b}}$ and allows you to talk about standard errors. For the second moment $mathrm{E}[mathbf{g}_imathbf{g}_i']$ to exist, you need the 4th moments of $x$ and $epsilon$ to exist. You want to argue that $sqrt{n}left(frac{1}{n} sum_i mathbf{x}_i' epsilon_i right) xrightarrow{d} mathcal{N}left( 0, Sigma right)$ where $Sigma = mathrm{E}left[mathbf{x}_imathbf{x}_i'epsilon_i^2 right]$. For this to work, $Sigma$ has to be finite.

A nice discussion (which motivated this post) is given in Hayashi's *Econometrics*. (See also p. 149 for 4th moments and estimating the covariance matrix.)

### Discussion:

These requirements on 4th moments is probably a technical point rather than a practical point. You're probably not going to encounter pathological distributions where this is a problem in everyday data? It's for more commonf or other assumptions of OLS to go awry.

A different question, undoubtedly answered elsewhere on Stackexchange, is how large of a sample you need for finite samples to get close to the asymptotic results. There's some sense in which fantastic outliers lead to slow convergence. For example, try estimating the mean of a lognormal distribution with really high variance. The sample mean is a consistent, unbiased estimator of the population mean, but in that log-normal case with crazy excess kurtosis etc… (follow link), finite sample results are really quite off.

Finite vs. infinite is a hugely important distinction in mathematics. That's not the problem you encounter in everyday statistics. Practical problems are more in the small vs. big category. Is the variance, kurtosis etc… small enough so that I can achieve reasonable estimates given my sample size?

### Pathological example where OLS estimator is consistent but not asymptotically normal

Consider:

$$ y_i = b x_i + epsilon_i$$ Where $x_i sim mathcal{N}(0,1)$ but $epsilon_i$ is drawn from a t-distribution with 2 degrees of freedom thus $mathrm{Var}(epsilon_i) = infty$. The OLS estimate converges in probability to $b$ but the sample distribution for the OLS estimate $hat{b}$ is not normally distributed. Below is the empirical distribution for $hat{b}$ based upon 10000 simulations of a regression with 10000 observations.

The distribution of $hat{b}$ isn't normal, the tails are too heavy. But if you increase the degrees of freedom to 3 so that the second moment of $epsilon_i$ exists then the central limit applies and you get:

Code to generate it:

`beta = [-4; 3.7]; n = 1e5; n_sim = 10000; for s=1:n_sim X = [ones(n, 1), randn(n, 1)]; u = trnd(2,n,1) / 100; y = X * beta + u; b(:,s) = X y; end b = b'; qqplot(b(:,2)); `