Why does this interpretation make sense only when we keep other variables constant ?
After all those coefficient weren't learnt that way(keeping other constant), so unless there is strong multi collinearity why would we need to fix the other variables' values for this interpretation to make sense, have we not already taken the correlation into account in the learning step and embedded it into the coefficient for each variable ?
Best Answer
I find that it helps to think about this geometrically. Basically what linear regression with 2 or more predictors is doing is finding the (hyper-)plane that best fits the data points. So we can plot the model predictions in 3 dimensions as something like this:
The regression coefficient (i.e., slope) for a particular predictor, let's say $x_1$, is interpreted as the change in the predicted $y$-value as we move along this regression plane in a direction parallel to the $x_1$ axis, and hence perpendicular to the $x_2$ axis. Notice that this interpretation has nothing whatsoever to do with the correlation between $x_1$ and $x_2$ in the dataset. It simply states a geometric fact about what happens in the $y$ dimension as we traverse along the regression plane in a particular direction. Because the direction of movement described by any predictor slope is necessarily perpendicular to all the other predictor axes, we can intuitively imagine that each one is "holding constant" the values on the other predictors.
Now, multicollinearity will of course have an impact on what these slopes are estimated to be and what the standard errors of those estimates are. But the key point is that once we have the slope estimates in hand, their interpretation does not reference the statistical associations among the predictors in any way. They only refer to the geometry of the regression plane.
Similar Posts:
- Solved – Estimation process in OLS with categorical variables and dumthe coding
- Solved – Can regression coefficients be higher than correlation coefficients?
- Solved – Can regression coefficients be higher than correlation coefficients?
- Solved – Interpretation regression intercept when only numerical predictors are standardized
- Solved – How are Multiple Linear Regression Lines Plotted