Solved – How to decompose a model using linear regression

I'm using "decompose" in a figurative sense here.

I have a simple regression model: y = b + b1x1 + b2x2 + e

The "e" term is the residual and "b" the intercept.

I want to show the contribution of x1 (and also x2) toward y

For the contribution of x1 I remove x2 by setting it to zero.
That yields: y|x1 = b + b1x1

For x2 then: y|x2 = b + b2x2

The problem is that these two contributions do not sum to the modeled y values as the intercept is double-counted: y|x1 + y|x2 = 2b + b1x1 + b2x2 != y

When I force the regression thru the origin it works fine and there are solid physical reasons to do so but I am looking for a more elegant way to do this?

Comments to answers here (I can't add comments below for some reason):

To Peter, yes, that is it. Imagine a stacked bar chart. I want my bar chart to have 3 components, the amount of y determined by x1, the amount of y determined by x2, and the residual. These should add up to the raw data.

In the regression specification

$$y = b_0 + b_1x_1 + b_2x_2 + u$$

1) The constant term, even if it does not emerge from the theoretical model behind the regression specification, captures the possibly non-zero mean of the error term. This means that we know that in all likelihood there are other factors that affect $y$ -we just hope that they do not co-vary with $x_1$ and/or $x_2$ (and in the last 20 years I have seen perhaps one regression where the constant term appeared not to be statistically highly significant, thus re-enforcing the "wisdom" of including it in the regression "no matter what").

2)If these two regressors are correlated, then by eliminating the one in trying to capture the "pure" effect of the other, you accomplish the exact opposite: the coefficient estimate of the remaining regressor will come from a biased estimator and so it will have a higher probability of being misleading (this is the textbook standard case of "omitted variables" bias). On the contrary if both are included, you are closer to estimate "better" the marginal effect of each regressor (the coefficient) and hence its total contribution.

3) Finally, note that in the "addition" you attempt, you add two conditional values, that are conditional on different sets, and you add them unweighted. Most certainly they don't end up being equal to the unconditional quantity (analogously, think that if, say you have a Bernoulli r.v. $c={0,1}$, then $P(Zmid c=1) + P(Zmid c=0) neq P(Z)$).

Similar Posts:

Rate this post

Leave a Comment