I would like to estimate a log-linear regression and examine the results with Stata's marginsplot command. I have transformed my dependent variable into natural logarithm (to make a highly skewed distribution less skewed), predictors are not transformed. The graph is difficult to interpret with the logarithm of outcome variable, how can I plot my results with the normal scale of Y (instead of the logarithm of Y)?
The Stata code I used is shown below:
clear, sysuse auto generate ln_price=ln(price) reg ln_price i.foreign mpg margins i.foreign, atmeans marginsplot
Best Answer
You have a problem that your predicted prices will be too small since
$$E[y vert x]=exp(x'beta) cdot E[exp(u)],$$
and you are leaving off the last factor. This is a consequence of Jensen's Inequality. If you take a look at the graphical proof at that link, it looks a lot like your case and should give you some intuition.
If you can assume that the errors are iid, you can estimate the second term with the sample average of exponentiated residuals. This is called the Duan smearing transformation. Unfortunately, there is no easy way to do this well with margins
that takes into account the variability. The estimates will be correct, but the SEs will be too small. I would recommend using a Poisson model with robust SEs, which makes this whole re-transformation business a lot easier.
I am also not a fan of this atmeans
business since it evaluates the predictions at nonsensical values, but that's another story.
Here's Stata code and output showing this with the recommended solution:
. sysuse auto, clear (1978 Automobile Data) . generate ln_price=ln(price) . reg ln_price i.foreign mpg Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 17.80 Model | 3.74819416 2 1.87409708 Prob > F = 0.0000 Residual | 7.47533892 71 .105286464 R-squared = 0.3340 -------------+---------------------------------- Adj R-squared = 0.3152 Total | 11.2235331 73 .153747029 Root MSE = .32448 ------------------------------------------------------------------------------ ln_price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | Foreign | .2824445 .0897634 3.15 0.002 .1034612 .4614277 mpg | -.0421151 .0071399 -5.90 0.000 -.0563517 -.0278785 _cons | 9.4536 .1485422 63.64 0.000 9.157415 9.749785 ------------------------------------------------------------------------------ . margins foreign, atmeans expression(exp(predict(xb))) Adjusted predictions Number of obs = 74 Model VCE : OLS Expression : exp(predict(xb)) at : 0.foreign = .7027027 (mean) 1.foreign = .2972973 (mean) mpg = 21.2973 (mean) ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | Domestic | 5201.293 240.3288 21.64 0.000 4730.258 5672.329 Foreign | 6898.83 507.0288 13.61 0.000 5905.071 7892.588 ------------------------------------------------------------------------------ . . /* Wrong expression */ . predict lnyhat, xb . gen yhat = exp(lnyhat) . . /* Duan's corrected expression (assumes iid errors) */ . predict uhat, residual . gen expuhat = exp(uhat) . sum expuhat Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- expuhat | 74 1.057817 .4070624 .599085 3.020036 . gen yhat_duan = r(mean)*exp(lnyhat) . . /* Note how the mean yhat is ~6% too low */ . sum price yhat* Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 yhat | 74 5796.027 1250.718 3008.888 9380.918 yhat_duan | 74 6131.136 1323.031 3182.853 9923.294 . . // not quite right, since it treats E[exp(uhat)] as a constant rather than a random . sum expuhat Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- expuhat | 74 1.057817 .4070624 .599085 3.020036 . margins foreign, atmeans expression(exp(predict(xb))*`=r(mean)') Adjusted predictions Number of obs = 74 Model VCE : OLS Expression : exp(predict(xb))*1.057816970992733 at : 0.foreign = .7027027 (mean) 1.foreign = .2972973 (mean) mpg = 21.2973 (mean) ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | Domestic | 5502.016 254.2238 21.64 0.000 5003.747 6000.286 Foreign | 7297.699 536.3436 13.61 0.000 6246.485 8348.913 ------------------------------------------------------------------------------ . /* might make sense to boostrap this */ . . /* Easiest Solution: fit a robust Poisson Model */ . poisson price i.foreign mpg, robust nolog Poisson regression Number of obs = 74 Wald chi2(2) = 33.91 Prob > chi2 = 0.0000 Log pseudolikelihood = -28478.503 Pseudo R2 = 0.3526 ------------------------------------------------------------------------------ | Robust price | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | Foreign | .2849739 .0876098 3.25 0.001 .1132618 .456686 mpg | -.0524904 .0094258 -5.57 0.000 -.0709647 -.0340162 _cons | 9.723688 .1967522 49.42 0.000 9.338061 10.10932 ------------------------------------------------------------------------------ . predict yhat_pois (option n assumed; predicted number of events) . sum price yhat* Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 yhat | 74 5796.027 1250.718 3008.888 9380.918 yhat_duan | 74 6131.136 1323.031 3182.853 9923.294 yhat_pois | 74 6165.257 1599.04 2582.605 10655.12 . margins foreign, atmeans Adjusted predictions Number of obs = 74 Model VCE : Robust Expression : Predicted number of events, predict() at : 0.foreign = .7027027 (mean) 1.foreign = .2972973 (mean) mpg = 21.2973 (mean) ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | Domestic | 5463.165 323.6294 16.88 0.000 4828.863 6097.467 Foreign | 7264.52 452.0574 16.07 0.000 6378.504 8150.536 ------------------------------------------------------------------------------
Stata Code:
cls sysuse auto, clear generate ln_price=ln(price) reg ln_price i.foreign mpg margins foreign, atmeans expression(exp(predict(xb))) /* Wrong expression */ predict lnyhat, xb gen yhat = exp(lnyhat) /* Duan's corrected expression (assumes iid errors) */ predict uhat, residual gen expuhat = exp(uhat) sum expuhat gen yhat_duan = r(mean)*exp(lnyhat) /* Note how the mean yhat is ~6% too low */ sum price yhat* // not quite right, since it treats E[exp(uhat)] as a constant rather than a random sum expuhat margins foreign, atmeans expression(exp(predict(xb))*`=r(mean)') /* might make sense to boostrap this */ /* Easiest Solution: fit a robust Poisson Model */ poisson price i.foreign mpg, robust nolog predict yhat_pois sum price yhat* margins foreign, atmeans
Similar Posts:
- Solved – Reporting regression statistics after logarithmic transformation
- Solved – Reporting regression statistics after logarithmic transformation
- Solved – In Stata, how to avoid negative values of lower confidence interval of proportion
- Solved – Not understanding reason for demeaning covariates in program evaluation regression
- Solved – Margins contrast after Tobit