I have a regression equation of this kind:
$$log {y} = a + bx + cx^2 + epsilon$$
where $a$ is the intercept, $b$ and $c$ are the coefficients of $x$ and $x^2,$ and $epsilon$ is the error. How do I interpret the impact of the variable $x$? I am fairly sure that I should not interpret $x$ and $x^2$ separately, but I can't figure out what their combined impact on $y$ is!
Best Answer
By "impact" of $x$ I understand you want to estimate the change in the predicted value when $x$ changes by some (small) amount $delta x.$ This is a simple calculation beginning with the fitted model
$$log(hat y(x)) = hat a + hat b x + hat c x^2$$
where the "hats" on the terms designate estimated values. Plugging in $x+delta x$ for the changed value of $x$ and subtracting the original value of $loghat y$ gives
$$log(hat y(x+delta x)) – log(hat y(x)) = hat b, delta x + hat c (2x, delta x + (delta x)^2).$$
Provided $hat c(delta x)^2$ is of negligible size compared to the remaining terms on the right hand side; that is, when
$$left|hat c, delta xright| ll left|hat b + 2 hat c, xright|,$$
we may neglect it for these interpretive purposes and write
$$logleft(frac{hat y(x+delta x)}{hat y(x)}right) = log(hat y(x+delta x)) – log(hat y(x)) approx left(hat b + 2 hat c xright) delta x .$$
On the left is the logarithm of the relative change in the predicted response $hat y(x).$ For small relative changes the (natural) logarithm will be very close to 1/100th of the percentage difference. For instance, when the log is 0.15, the relative change will be very close to a +15% increase. (For many purposes this rule of thumb holds for percentages between $pm 20%,$ roughly.)
On the right is a multiple of the change $delta x$ induced in the regressor. That multiple is $hat b + 2hat c x.$ Of note is that it depends on the value of $x$ you started with. In other words, the change in the response depends on what the regressor value is: it is not constant.
Another way to restate this interpretation is to exponentiate both sides, which expresses the response on its original (rather than log) scale, yielding
$$hat y(x+delta x) approx hat y(x)expleft(left(hat b + 2 hat c xright) delta xright) approx hat y(x)left(1 + left(hat b + 2 hat c xright) delta xright).$$
The new value, on the left hand side, is expressed as change of the old value by approximately $100% times left(hat b + 2 hat c xright) delta x.$
Although this might seem a little complicated and not easy to remember, please note that all the calculations involved are simple: they are just some multiplications and additions. To those familiar with the differential Calculus, they can be read directly off the original model equation with only the simplest mental arithmetic, because (taking differentials) it is immediate that
$$frac{y^prime (x)}{y(x)}, dx = frac{d}{dx} log(y(x) ), dx = (b + 2cx), dx$$
and all you have to do is "put hats on" all the estimates and, as usual, interpret $dx$ as a (sufficiently) small increment in $x.$
Similar Posts:
- Solved – Convexity of Function of PDF and CDF of Standard Normal Random Variable
- Solved – How to interpret a percent variable with a log-transformed outcome
- Solved – Why the change in output of a sigmoid neuron is a linear function of change in weights and change in bias
- Solved – Interpreting the change in two logs in a regression
- Solved – Using Dirac Delta functions for estimating a probability distribution