In Wooldridge's Econometric Analysis of Cross Section and Panel Data, he defines linear projection of $y$ on $1,mathbf{x}$, in the following way:

Let's assume that $Var(mathbf{x})$ is positive-definite, then the linear projection of $y$ on $mathbf{x}$ exists and is unique such that

$L(y|1,mathbf{x})=beta_0+mathbf{x}beta$, where by definition

$beta=(Var(mathbf{x}))^{-1}Cov(mathbf{x},y)$ and $beta_0=E(y)-E(mathbf{x})beta$.

First I do not see how the author reaches that definition, and how it relates to this one in Wikipedia's article on projection. We usually have $P=X(X'X)^{-1}X'$, where $X$ is the matrix with columns $1,text{and }mathbf{x}$, for the sample (not sure how to write this projection w.r.t. the population though). Second, instead of defining the betas as above, couldn't we have derived those equations? How?

Any help would be appreciated.

Edit: According to the book, the notation is the following. $mathbf{x}$ is a row vector of dimension $K$, so the dimension of the design matrix is $Ntimes K$

**Contents**hide

#### Best Answer

Another formulation of the $beta$ regression parameter estimator is as

$hat{beta} = left(mathbf{X}^Tmathbf{X}right)^{-1} mathbf{X}^T Y$

Here $hat{beta}$ is a two element vector of $hat{beta}_0$ the intercept and $hat{beta}_1$ the slope. I like to use $mathbf{X}$ notationally as a design matrix with the principal column a vector of 1s.

It's easy to see the crossproducts have a factor of $n$ that cancels out in these operations. WLOG we may assume that the random component(s) of $mathbf{X}$ are centered, you have:

$mbox{Cov} (mathbf{X}) = frac{1}{n}left( mathbf{X}^T mathbf{X} right) $

and

$mbox{Cov} (mathbf{X}, Y) = frac{1}{n}left( mathbf{X}^T Y right) $

Therefore the least squares estimator can be expressed as $beta_1 = E(hat{beta}_1) = mbox{Cov}(X, Y) / mbox{Var} (X)$ for univariate models.

The projection matrix is formulated as $ P = mathbf{X}left(mathbf{X}^Tmathbf{X}right)^{-1} mathbf{X}^T$ and the predicted values of $Y$ (e.g. $hat{Y}$) are given by:

$hat{Y} = PY = mathbf{X}left(mathbf{X}^Tmathbf{X}right)^{-1} mathbf{X}^TY = mathbf{X} hat{beta}$

and you'll see it's a projection. All predicted values of $Y$ are formed using a linear combination of vectors of $mathbf{X}$ e.g. they span the basis of $mathbf{X}$. The projection specifically "projects" the values of $Y$ onto the fitted values $hat{Y}$.

The author has formulated the fitted value or conditional mean function using unusual notation $L(y | 1, x)$ is basically equivalent to $hat{Y}$

Alternately, the hat matrix, or influence matrix, is $H = mathcal{I} – P$ and the residuals are given by $r = HY$.

Reference: Seber, Lee 2nd edition 1990.

### Similar Posts:

- Solved – Relationship between Linear Projection and OLS Regression
- Solved – How to calculate a confidence interval for a parameter in multiple linear regression
- Solved – Why is a projection matrix of an orthogonal projection symmetric
- Solved – Fitted values and residuals: are they random vectors
- Solved – Relationship between eigenvectors of $frac{1}{N}XX^top$ and $frac{1}{N}X^top X$ in the context of PCA