I want to regress $x_1, x_2, x_3$ and $x_4$ on $y$.

Question: When fitting a regression model in the regular fashion without any transformations the coefficient for $x_4$ is positive. However when I log transform the data (all the variables) the sign of the coefficient for $x_4$ becomes negative.

Any idea why a log transform would change the sign?

**Contents**hide

#### Best Answer

At first I thought this must have to do with multicollinearity. But then I tried this out with even a single predictor and you can observe this as well.

The reason is quite simple: it's the noise in the data and the fact that we can't estimate the function (and its derivative) perfectly with finite samples. Moreover, if you believe there is an underlying true (log-)linear relationship than either the log-log model or the original variable model is *not* linear.

`set.seed(1) XX <- matrix(exp(rnorm(20)), ncol = 1) yy <- exp(rnorm(nrow(XX)) + 0.1 * XX) mod.orig <- lm(yy~XX) mod.log.log <- lm(log(yy)~log(XX)) mod.log.orig <- lm(log(yy)~XX) layout(matrix(1:3, ncol = 3)) plot(XX, yy) abline(mod.orig, col = 2) abline(h = 1, lty = 2, col = 4) plot(log(XX), log(yy)) abline(mod.log.log, col = 2) abline(h = 0, lty = 2, col = 4) plot(XX, log(yy)) abline(mod.log.orig, col = 2) abline(h = 0, lty = 2, col = 4) `

Now in your 4-variable case, I am sure multicollinearity plays a role as well (and correlation of X vs log(X) variables is also very much affected by noise).

**Update:** I forgot to add the third option before which is the *true* model in linear terms: $log y = alpha + beta cdot x$. I added this option as a third option as well.