# Solved – Predicting probabilities after log-linear regression

I would like to estimate a log-linear regression and examine the results with Stata's marginsplot command. I have transformed my dependent variable into natural logarithm (to make a highly skewed distribution less skewed), predictors are not transformed. The graph is difficult to interpret with the logarithm of outcome variable, how can I plot my results with the normal scale of Y (instead of the logarithm of Y)?

The Stata code I used is shown below:

``clear, sysuse auto generate ln_price=ln(price) reg ln_price i.foreign mpg margins i.foreign, atmeans marginsplot ``
Contents

You have a problem that your predicted prices will be too small since

$$E[y vert x]=exp(x'beta) cdot E[exp(u)],$$

and you are leaving off the last factor. This is a consequence of Jensen's Inequality. If you take a look at the graphical proof at that link, it looks a lot like your case and should give you some intuition.

If you can assume that the errors are iid, you can estimate the second term with the sample average of exponentiated residuals. This is called the Duan smearing transformation. Unfortunately, there is no easy way to do this well with `margins` that takes into account the variability. The estimates will be correct, but the SEs will be too small. I would recommend using a Poisson model with robust SEs, which makes this whole re-transformation business a lot easier.

I am also not a fan of this `atmeans` business since it evaluates the predictions at nonsensical values, but that's another story.

Here's Stata code and output showing this with the recommended solution:

``. sysuse auto, clear (1978 Automobile Data)  . generate ln_price=ln(price)  . reg ln_price i.foreign mpg        Source |       SS           df       MS      Number of obs   =        74 -------------+----------------------------------   F(2, 71)        =     17.80        Model |  3.74819416         2  1.87409708   Prob > F        =    0.0000     Residual |  7.47533892        71  .105286464   R-squared       =    0.3340 -------------+----------------------------------   Adj R-squared   =    0.3152        Total |  11.2235331        73  .153747029   Root MSE        =    .32448  ------------------------------------------------------------------------------     ln_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+----------------------------------------------------------------      foreign |     Foreign  |   .2824445   .0897634     3.15   0.002     .1034612    .4614277          mpg |  -.0421151   .0071399    -5.90   0.000    -.0563517   -.0278785        _cons |     9.4536   .1485422    63.64   0.000     9.157415    9.749785 ------------------------------------------------------------------------------  . margins foreign, atmeans expression(exp(predict(xb)))  Adjusted predictions                            Number of obs     =         74 Model VCE    : OLS  Expression   : exp(predict(xb)) at           : 0.foreign       =    .7027027 (mean)                1.foreign       =    .2972973 (mean)                mpg             =     21.2973 (mean)  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------      foreign |    Domestic  |   5201.293   240.3288    21.64   0.000     4730.258    5672.329     Foreign  |    6898.83   507.0288    13.61   0.000     5905.071    7892.588 ------------------------------------------------------------------------------  .  . /* Wrong expression */ . predict lnyhat, xb  . gen yhat = exp(lnyhat)  .  . /* Duan's corrected expression (assumes iid errors) */ . predict uhat, residual  . gen expuhat = exp(uhat)  . sum expuhat      Variable |        Obs        Mean    Std. Dev.       Min        Max -------------+---------------------------------------------------------      expuhat |         74    1.057817    .4070624    .599085   3.020036  . gen yhat_duan = r(mean)*exp(lnyhat)  .  . /* Note how the mean yhat is ~6% too low */ . sum price yhat*      Variable |        Obs        Mean    Std. Dev.       Min        Max -------------+---------------------------------------------------------        price |         74    6165.257    2949.496       3291      15906         yhat |         74    5796.027    1250.718   3008.888   9380.918    yhat_duan |         74    6131.136    1323.031   3182.853   9923.294  .  . // not quite right, since it treats E[exp(uhat)] as a constant rather than a random . sum expuhat      Variable |        Obs        Mean    Std. Dev.       Min        Max -------------+---------------------------------------------------------      expuhat |         74    1.057817    .4070624    .599085   3.020036  . margins foreign, atmeans expression(exp(predict(xb))*`=r(mean)')   Adjusted predictions                            Number of obs     =         74 Model VCE    : OLS  Expression   : exp(predict(xb))*1.057816970992733 at           : 0.foreign       =    .7027027 (mean)                1.foreign       =    .2972973 (mean)                mpg             =     21.2973 (mean)  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------      foreign |    Domestic  |   5502.016   254.2238    21.64   0.000     5003.747    6000.286     Foreign  |   7297.699   536.3436    13.61   0.000     6246.485    8348.913 ------------------------------------------------------------------------------  . /* might make sense to boostrap this */ .  . /* Easiest Solution: fit a robust Poisson Model */ . poisson price i.foreign mpg, robust nolog  Poisson regression                              Number of obs     =         74                                                 Wald chi2(2)      =      33.91                                                 Prob > chi2       =     0.0000 Log pseudolikelihood = -28478.503               Pseudo R2         =     0.3526  ------------------------------------------------------------------------------              |               Robust        price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------      foreign |     Foreign  |   .2849739   .0876098     3.25   0.001     .1132618     .456686          mpg |  -.0524904   .0094258    -5.57   0.000    -.0709647   -.0340162        _cons |   9.723688   .1967522    49.42   0.000     9.338061    10.10932 ------------------------------------------------------------------------------  . predict yhat_pois (option n assumed; predicted number of events)  . sum price yhat*      Variable |        Obs        Mean    Std. Dev.       Min        Max -------------+---------------------------------------------------------        price |         74    6165.257    2949.496       3291      15906         yhat |         74    5796.027    1250.718   3008.888   9380.918    yhat_duan |         74    6131.136    1323.031   3182.853   9923.294    yhat_pois |         74    6165.257     1599.04   2582.605   10655.12  . margins foreign, atmeans  Adjusted predictions                            Number of obs     =         74 Model VCE    : Robust  Expression   : Predicted number of events, predict() at           : 0.foreign       =    .7027027 (mean)                1.foreign       =    .2972973 (mean)                mpg             =     21.2973 (mean)  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------      foreign |    Domestic  |   5463.165   323.6294    16.88   0.000     4828.863    6097.467     Foreign  |    7264.52   452.0574    16.07   0.000     6378.504    8150.536 ------------------------------------------------------------------------------ ``

Stata Code:

``cls sysuse auto, clear generate ln_price=ln(price) reg ln_price i.foreign mpg margins foreign, atmeans expression(exp(predict(xb)))  /* Wrong expression */ predict lnyhat, xb gen yhat = exp(lnyhat)  /* Duan's corrected expression (assumes iid errors) */ predict uhat, residual gen expuhat = exp(uhat) sum expuhat gen yhat_duan = r(mean)*exp(lnyhat)  /* Note how the mean yhat is ~6% too low */ sum price yhat*  // not quite right, since it treats E[exp(uhat)] as a constant rather than a random sum expuhat margins foreign, atmeans expression(exp(predict(xb))*`=r(mean)')  /* might make sense to boostrap this */  /* Easiest Solution: fit a robust Poisson Model */ poisson price i.foreign mpg, robust nolog predict yhat_pois sum price yhat* margins foreign, atmeans ``

Rate this post