I am taking Andrew Ng's *Machine Learning* class on Coursera and in the below slide he distinguishes principal component analysis (PCA) from Linear Regression. He says that in Linear Regression, we draw vertical lines from the data points to the line of best fit, whereas in PCA, we draw lines that are perpendicular to achieve the shortest distance.

I thought with linear regression we always use some Euclidean distance metric to calculate the error from what our hypothesis function predicts vs. what the actual data point was. Why doesn't it use the shortest distance *a la* PCA?

**Contents**hide

#### Best Answer

With linear regression, we are modeling the conditional mean of the outcome, $E[Y|X] = a + bX$. Therefore, the $X$s are thought of as being "conditioned upon"; part of the experimental design, or representative of the population of interest.

That means any distance between the observed $Y$ and it's predicted (conditional mean) value, $hat{Y}$ is thought of as an error and is given the value $r = Y – hat{Y}$ as the "residual error". The conditional error of $Y$ is estimated from these values (again, no variability is considered on the behalf of $X$ values). Geometrically, that is a "straight up and down" kind of measurement.

In cases where there is measurement variability in $X$ as well, some considerations and assumptions must be discussed briefly to motivate usage of linear regression in this fashion. In particular, regression models are prone to nondifferential misclassification which may attenuate the slope of the regression model, $b$.

### Similar Posts:

- Solved – Why is linear regression different from PCA
- Solved – Why is linear regression different from PCA
- Solved – Why is the Manhattan distance (or block distance) appropriate when I have a discrete data set
- Solved – Why is the Manhattan distance (or block distance) appropriate when I have a discrete data set
- Solved – Minimizing the median absolute deviation or median absolute error