Solved – Including several endogenous interaction terms

I would like to write you beacause of the following issue: I´m estimating an IV-model with the following common structure: \$Y = constant + b1*X1 + b2*X2 + b3*Xend + b..*Xcontrols\$. I´ve found also a promising instrumental variable for \$Xend\$, \$Xinstr\$. In order to check overall robustness I used the original OLS and OLS vce robust specification and several 2SLS estimators. In general and beside some minor changes in coefficients and significance levels (probably due to the adequacy of IV-Regression) the theoretically hypothesized effects keep in place.

But as soon, as I modify my model to an interaction model:
\$Y = constant + b1*X1 + b2*X2 + b3*Xend + b4*(X1*Xend) + b5*(X2*Xend) + b..*controlsX\$

some really odd things happen: There is a very notable and thus confusing structural change in the values of coefficients and further significance related statistics between the classical OLS estimators and the several 2SLS estimators. In detail, every prior (in OLS) significant realtionship cancels out (e.g \$b1\$ \$b2\$ \$b3\$ and \$b4\$) and the coefficients even change signs.

As literature suggested in my first stage equation I´ve used the variable ((\$Xinstr * X1\$) and (\$Xinstr * X2\$)) as an instrument itself for the newly added endogenous interaction terms (in stata notation e.g. `ivregress Y (Xend (Xend*X1) (Xend*X2) = Xinstr. (Xinstr. * X1) (Xinstr. * X2)) X1 X2 Xcontrols)`.

What is going on here? Why is this change happening?

Here are some actual quick and dirty examples of my work on car sales and marketing strategies (please forgive me the formatting issues; i also shortened the actual output and the variations in estimators in the interest of time).

As you can see in the original regressions (non-interaction) there is no big difference….but in the interaction model the obtained effects via OLS cancel out (especially for the two strategy related variables of main interest).

``quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2  sourcing car_type1 car_type2 (+"List of additional control variables")  estimates store OLS    quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables"), robust  estimates store OLS_robust  global ivmodel lnsales (car_quality = peer_quality) marketing_strategy1   marketing_strategy2  sourcing car_type1 car_type2 (+"List of additional control variables") quietly ivregress 2sls \$ivmodel estimates store TwoSLS_def quietly ivregress 2sls \$ivmodel , vce(robust) estimates store TwoSLS__2 quietly ivregress gmm \$ivmodel , wmatrix(robust) estimates store GMM_het quietly ivregress gmm \$ivmodel , wmatrix(robust) igmm estimates store IGMM quietly ivregress liml \$ivmodel , vce(robust) estimates store LIML   estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML,  b se p stats(N r2)   ------------------------------------------------------------------------------     Variable |        OLS   OLS_robust   TwoSLS_def    TwoSLS__2       GMM_het  -------------+----------------------------------------------------------------     car_~y   |  .44455351    .44455351    .44888526    .44888526    .44888526               |  .05834619    .07762703    .12372644    .10091798    .10091798               |     0.0000       0.0000       0.0003       0.0000       0.0000   marketing_~1 | -.02134571   -.02134571   -.02261369   -.02261369   -.02261369               |  .14387381    .13990431    .13956152    .13548022    .13548022                |     0.8822       0.8789       0.8713       0.8674       0.8674  marketing_~2 | -.34940482   -.34940482    -.3491414    -.3491414    -.3491414                 |  .15259582    .13431119    .14412673     .1269109     .1269109                 |     0.0229       0.0099       0.0154       0.0059       0.0059    sourcing     |  .00599138    .00599138    .00603506    .00603506    .00603506                  |  .15266332    .14239443    .14403715    .13414465    .13414465                |     0.9687       0.9665       0.9666       0.9641       0.9641       car_~1   | -.30344565   -.30344565   -.30478088   -.30478088   -.30478088                 |  .27143962    .26951864    .25836192    .26001529    .26001529                |     0.2647       0.2613       0.2381       0.2411       0.2411           car_~2   | -.02749295   -.02749295   -.03170655   -.03170655   -.03170655                 |  .34545754    .39088556    .34328748    .36963657    .36963657      ..........  .......... .......... ``

Now the model with interactions…. please note the shifts from OLS to 2sls in the quality and strategy variables

``quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional  control variables")  estimates store OLS  quietly regress lnsales product_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional  control variables"), robust  estimates store OLS_robust  global ivmodel lnsales (c.car_quality c.car_quality#i.marketing_strategy1 c.car_quality#i.marketing_strategy2= c.peer_quality i.marketing_strategy1#c.peer_quality i.marketing_strategy2#c.peer_quality) marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2(+"List of additional control variables") quietly ivregress 2sls \$ivmodel estimates store TwoSLS_def quietly ivregress 2sls \$ivmodel , vce(robust) estimates store TwoSLS__2 quietly ivregress gmm \$ivmodel , wmatrix(robust) estimates store GMM_het  estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML, b se   p stats(N r2)  ------------------------------------------------------------------------------     Variable |     OLS       OLS_robust   TwoSLS_def   TwoSLS__2      GMM_het  -------------+----------------------------------------------------------------     car_~y   |  .30626371    .30626371    .40466472    .40466472    .40466472                |  .06639855    .08737882    .17734552    .14822445    .14822445               |     0.0000       0.0005       0.0225       0.0063       0.0063                | marketing_~1 | -2.7663962   -2.7663962    -1.022544    -1.022544    -1.022544               |  .87427115    .87740022     3.468728     3.021177     3.021177                |     0.0018       0.0018       0.7682       0.7350      0.7350                | marketing_~1#|     c.car~y  |           1  |  .40964628    .40964628    .14894708    .14894708    .14894708                |  .12788375    .12954421    .51333938    .44914179    .44914179               |     0.0015       0.0018       0.7717       0.7402       0.7402   marketing_~2 | -1.6974189   -1.6974189   -.81075049   -.81075049   -.81075047               |  1.2256574    1.0156041    4.4093988    3.5747531    3.5747531               |     0.1674       0.0960       0.8541       0.8206       0.8206              | marketing_~2#|     c.car~y  |           1  |  .20617457    .20617457    .07077817    .07077817    .07077817               |  .18004716    .14488011    .65063831    .53051219    .53051219               |     0.2533       0.1560       0.9134       0.8939       0.8939               | sourcing     |  .02814061    .02814061    .01454754    .01454754    .01454754               |  .15052717    .13857819    .17351094    .14787563    .14787563               |     0.8519       0.8393       0.9332       0.9216       0.9216     car_~1   | -.23592028   -.23592028   -.28205832   -.28205832   -.28205832               |  .26452637    .23238727    .26379489    .24610577    .24610577               |     0.3734       0.3110       0.2850       0.2518       0.2518     car_~2   | -.02415081   -.02415081   -.03596115   -.03596115   -.03596115              |  .33585648    .37759328    .33613488    .36136989    .36136989              |     0.9427       0.9491       0.9148       0.9207       0.9207 ............. ............. .............  ``
Contents

There could all sorts of things going on, but without knowing more about the details of your model and actual commands and results, it will be hard to say more. Don't show us pseudo-code with generic y and x. No one but you can decipher what `Xinstr. (Xinstr. * X1)` means. At the very least, show us the actual Stata commands you typed. Also, from the parentheses arrangement in your question, it seems like you share the common misunderstanding that instruments map onto the endogenous variables one to one. That's not how IV works.

Having said that, the first thing I would try is to make sure that you're comparing apples to apples. In the simple model, the IV and OLS coefficients on \$X_{end}\$ are the marginal effects. In the interactions model, the marginal effects are more complicated and non-linear, so you need to take that into account when comparing. You can't just look at the coefficients.

Here's an example:

``. webuse hsng2, clear (1980 Census housing data)  . ivregress 2sls rent c.pcturban (c.hsngval = faminc i.region)  Instrumental variables (2SLS) regression          Number of obs   =         50                                                   Wald chi2(2)    =      90.76                                                   Prob > chi2     =     0.0000                                                   R-squared       =     0.5989                                                   Root MSE        =     22.166  ------------------------------------------------------------------------------         rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------      hsngval |   .0022398   .0003284     6.82   0.000     .0015961    .0028836     pcturban |    .081516   .2987652     0.27   0.785     -.504053     .667085        _cons |   120.7065   15.22839     7.93   0.000     90.85942    150.5536 ------------------------------------------------------------------------------ Instrumented:  hsngval Instruments:   pcturban faminc 2.region 3.region 4.region  . ivregress 2sls rent c.pcturban (c.hsngval c.hsngval#c.pcturban = faminc i.region)  Instrumental variables (2SLS) regression          Number of obs   =         50                                                   Wald chi2(3)    =      95.82                                                   Prob > chi2     =     0.0000                                                   R-squared       =     0.5886                                                   Root MSE        =     22.448  --------------------------------------------------------------------------------------                 rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval] ---------------------+----------------------------------------------------------------              hsngval |    .012628   .0038516     3.28   0.001     .0050791    .0201769                      | c.hsngval#c.pcturban |  -.0001453   .0000537    -2.71   0.007    -.0002505   -.0000401                      |             pcturban |   7.037653   2.587203     2.72   0.007     1.966828    12.10848                _cons |  -358.7519    177.772    -2.02   0.044    -707.1785   -10.32518 -------------------------------------------------------------------------------------- Instrumented:  hsngval c.hsngval#c.pcturban Instruments:   pcturban faminc 2.region 3.region 4.region  . margins, dydx(hsngval)  Average marginal effects                        Number of obs     =         50 Model VCE    : Unadjusted  Expression   : Linear prediction, predict() dy/dx w.r.t. : hsngval  ------------------------------------------------------------------------------              |            Delta-method              |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------      hsngval |   .0028993   .0004123     7.03   0.000     .0020912    .0037074 ------------------------------------------------------------------------------     . regress rent c.pcturban c.hsngval        Source |       SS           df       MS      Number of obs   =        50 -------------+----------------------------------   F(2, 47)        =     47.54        Model |  40983.5269         2  20491.7635   Prob > F        =    0.0000     Residual |  20259.5931        47  431.055172   R-squared       =    0.6692 -------------+----------------------------------   Adj R-squared   =    0.6551        Total |    61243.12        49  1249.85959   Root MSE        =    20.762  ------------------------------------------------------------------------------         rent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+----------------------------------------------------------------     pcturban |   .5248216   .2490782     2.11   0.040     .0237408    1.025902      hsngval |   .0015205   .0002276     6.68   0.000     .0010627    .0019784        _cons |   125.9033   14.18537     8.88   0.000     97.36603    154.4406 ------------------------------------------------------------------------------  . regress rent c.pcturban##c.hsngval        Source |       SS           df       MS      Number of obs   =        50 -------------+----------------------------------   F(3, 46)        =     53.26        Model |  47553.1926         3  15851.0642   Prob > F        =    0.0000     Residual |  13689.9274        46  297.607117   R-squared       =    0.7765 -------------+----------------------------------   Adj R-squared   =    0.7619        Total |    61243.12        49  1249.85959   Root MSE        =    17.251  --------------------------------------------------------------------------------------                 rent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ---------------------+----------------------------------------------------------------             pcturban |   3.359486   .6378362     5.27   0.000     2.075588    4.643383              hsngval |   .0068502     .00115     5.96   0.000     .0045353     .009165                      | c.pcturban#c.hsngval |  -.0000666   .0000142    -4.70   0.000    -.0000951    -.000038                      |                _cons |  -97.85703    49.0617    -1.99   0.052    -196.6131    .8990436 --------------------------------------------------------------------------------------  . margins, dydx(hsngval)  Average marginal effects                        Number of obs     =         50 Model VCE    : OLS  Expression   : Linear prediction, predict() dy/dx w.r.t. : hsngval  ------------------------------------------------------------------------------              |            Delta-method              |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+----------------------------------------------------------------      hsngval |   .0023936   .0002651     9.03   0.000     .0018599    .0029272 ------------------------------------------------------------------------------ ``

Note how in the IV spec with interaction, the coefficient on housing value is over 5.5 times larger than in the simple IV spec. The marginal effect (averaging over percent urban), however, is pretty similar.

Finally, if you only have one instrument you probably want something like this:

``ivregress 2sls rent c.pcturban (c.hsngval c.hsngval#c.pcturban = c.faminc c.faminc#c.pcturban) margins, dydx(hsngval) ``

A quadratic endogenous variable would be:

``ivregress 2sls rent c.pcturban (c.hsngval##c.hsngval = c.faminc##c.faminc) margins, dydx(hsngval) ``

The example above did not work out as nicely with these, so I used two instruments.

Rate this post