I would like to write you beacause of the following issue: I´m estimating an IV-model with the following common structure: $Y = constant + b1*X1 + b2*X2 + b3*Xend + b..*Xcontrols$. I´ve found also a promising instrumental variable for $Xend$, $Xinstr$. In order to check overall robustness I used the original OLS and OLS vce robust specification and several 2SLS estimators. In general and beside some minor changes in coefficients and significance levels (probably due to the adequacy of IV-Regression) the theoretically hypothesized effects keep in place.
But as soon, as I modify my model to an interaction model:
$Y = constant + b1*X1 + b2*X2 + b3*Xend + b4*(X1*Xend) + b5*(X2*Xend) + b..*controlsX$
some really odd things happen: There is a very notable and thus confusing structural change in the values of coefficients and further significance related statistics between the classical OLS estimators and the several 2SLS estimators. In detail, every prior (in OLS) significant realtionship cancels out (e.g $b1$ $b2$ $b3$ and $b4$) and the coefficients even change signs.
As literature suggested in my first stage equation I´ve used the variable (($Xinstr * X1$) and ($Xinstr * X2$)) as an instrument itself for the newly added endogenous interaction terms (in stata notation e.g. ivregress Y (Xend (Xend*X1) (Xend*X2) = Xinstr. (Xinstr. * X1) (Xinstr. * X2)) X1 X2 Xcontrols)
.
What is going on here? Why is this change happening?
Here are some actual quick and dirty examples of my work on car sales and marketing strategies (please forgive me the formatting issues; i also shortened the actual output and the variations in estimators in the interest of time).
As you can see in the original regressions (non-interaction) there is no big difference….but in the interaction model the obtained effects via OLS cancel out (especially for the two strategy related variables of main interest).
quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables") estimates store OLS quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables"), robust estimates store OLS_robust global ivmodel lnsales (car_quality = peer_quality) marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables") quietly ivregress 2sls $ivmodel estimates store TwoSLS_def quietly ivregress 2sls $ivmodel , vce(robust) estimates store TwoSLS__2 quietly ivregress gmm $ivmodel , wmatrix(robust) estimates store GMM_het quietly ivregress gmm $ivmodel , wmatrix(robust) igmm estimates store IGMM quietly ivregress liml $ivmodel , vce(robust) estimates store LIML estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML, b se p stats(N r2) ------------------------------------------------------------------------------ Variable | OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het -------------+---------------------------------------------------------------- car_~y | .44455351 .44455351 .44888526 .44888526 .44888526 | .05834619 .07762703 .12372644 .10091798 .10091798 | 0.0000 0.0000 0.0003 0.0000 0.0000 marketing_~1 | -.02134571 -.02134571 -.02261369 -.02261369 -.02261369 | .14387381 .13990431 .13956152 .13548022 .13548022 | 0.8822 0.8789 0.8713 0.8674 0.8674 marketing_~2 | -.34940482 -.34940482 -.3491414 -.3491414 -.3491414 | .15259582 .13431119 .14412673 .1269109 .1269109 | 0.0229 0.0099 0.0154 0.0059 0.0059 sourcing | .00599138 .00599138 .00603506 .00603506 .00603506 | .15266332 .14239443 .14403715 .13414465 .13414465 | 0.9687 0.9665 0.9666 0.9641 0.9641 car_~1 | -.30344565 -.30344565 -.30478088 -.30478088 -.30478088 | .27143962 .26951864 .25836192 .26001529 .26001529 | 0.2647 0.2613 0.2381 0.2411 0.2411 car_~2 | -.02749295 -.02749295 -.03170655 -.03170655 -.03170655 | .34545754 .39088556 .34328748 .36963657 .36963657 .......... .......... ..........
Now the model with interactions…. please note the shifts from OLS to 2sls in the quality and strategy variables
quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables") estimates store OLS quietly regress lnsales product_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables"), robust estimates store OLS_robust global ivmodel lnsales (c.car_quality c.car_quality#i.marketing_strategy1 c.car_quality#i.marketing_strategy2= c.peer_quality i.marketing_strategy1#c.peer_quality i.marketing_strategy2#c.peer_quality) marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2(+"List of additional control variables") quietly ivregress 2sls $ivmodel estimates store TwoSLS_def quietly ivregress 2sls $ivmodel , vce(robust) estimates store TwoSLS__2 quietly ivregress gmm $ivmodel , wmatrix(robust) estimates store GMM_het estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML, b se p stats(N r2) ------------------------------------------------------------------------------ Variable | OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het -------------+---------------------------------------------------------------- car_~y | .30626371 .30626371 .40466472 .40466472 .40466472 | .06639855 .08737882 .17734552 .14822445 .14822445 | 0.0000 0.0005 0.0225 0.0063 0.0063 | marketing_~1 | -2.7663962 -2.7663962 -1.022544 -1.022544 -1.022544 | .87427115 .87740022 3.468728 3.021177 3.021177 | 0.0018 0.0018 0.7682 0.7350 0.7350 | marketing_~1#| c.car~y | 1 | .40964628 .40964628 .14894708 .14894708 .14894708 | .12788375 .12954421 .51333938 .44914179 .44914179 | 0.0015 0.0018 0.7717 0.7402 0.7402 marketing_~2 | -1.6974189 -1.6974189 -.81075049 -.81075049 -.81075047 | 1.2256574 1.0156041 4.4093988 3.5747531 3.5747531 | 0.1674 0.0960 0.8541 0.8206 0.8206 | marketing_~2#| c.car~y | 1 | .20617457 .20617457 .07077817 .07077817 .07077817 | .18004716 .14488011 .65063831 .53051219 .53051219 | 0.2533 0.1560 0.9134 0.8939 0.8939 | sourcing | .02814061 .02814061 .01454754 .01454754 .01454754 | .15052717 .13857819 .17351094 .14787563 .14787563 | 0.8519 0.8393 0.9332 0.9216 0.9216 car_~1 | -.23592028 -.23592028 -.28205832 -.28205832 -.28205832 | .26452637 .23238727 .26379489 .24610577 .24610577 | 0.3734 0.3110 0.2850 0.2518 0.2518 car_~2 | -.02415081 -.02415081 -.03596115 -.03596115 -.03596115 | .33585648 .37759328 .33613488 .36136989 .36136989 | 0.9427 0.9491 0.9148 0.9207 0.9207 ............. ............. .............
Best Answer
There could all sorts of things going on, but without knowing more about the details of your model and actual commands and results, it will be hard to say more. Don't show us pseudo-code with generic y and x. No one but you can decipher what Xinstr. (Xinstr. * X1)
means. At the very least, show us the actual Stata commands you typed. Also, from the parentheses arrangement in your question, it seems like you share the common misunderstanding that instruments map onto the endogenous variables one to one. That's not how IV works.
Having said that, the first thing I would try is to make sure that you're comparing apples to apples. In the simple model, the IV and OLS coefficients on $X_{end}$ are the marginal effects. In the interactions model, the marginal effects are more complicated and non-linear, so you need to take that into account when comparing. You can't just look at the coefficients.
Here's an example:
. webuse hsng2, clear (1980 Census housing data) . ivregress 2sls rent c.pcturban (c.hsngval = faminc i.region) Instrumental variables (2SLS) regression Number of obs = 50 Wald chi2(2) = 90.76 Prob > chi2 = 0.0000 R-squared = 0.5989 Root MSE = 22.166 ------------------------------------------------------------------------------ rent | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hsngval | .0022398 .0003284 6.82 0.000 .0015961 .0028836 pcturban | .081516 .2987652 0.27 0.785 -.504053 .667085 _cons | 120.7065 15.22839 7.93 0.000 90.85942 150.5536 ------------------------------------------------------------------------------ Instrumented: hsngval Instruments: pcturban faminc 2.region 3.region 4.region . ivregress 2sls rent c.pcturban (c.hsngval c.hsngval#c.pcturban = faminc i.region) Instrumental variables (2SLS) regression Number of obs = 50 Wald chi2(3) = 95.82 Prob > chi2 = 0.0000 R-squared = 0.5886 Root MSE = 22.448 -------------------------------------------------------------------------------------- rent | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------------+---------------------------------------------------------------- hsngval | .012628 .0038516 3.28 0.001 .0050791 .0201769 | c.hsngval#c.pcturban | -.0001453 .0000537 -2.71 0.007 -.0002505 -.0000401 | pcturban | 7.037653 2.587203 2.72 0.007 1.966828 12.10848 _cons | -358.7519 177.772 -2.02 0.044 -707.1785 -10.32518 -------------------------------------------------------------------------------------- Instrumented: hsngval c.hsngval#c.pcturban Instruments: pcturban faminc 2.region 3.region 4.region . margins, dydx(hsngval) Average marginal effects Number of obs = 50 Model VCE : Unadjusted Expression : Linear prediction, predict() dy/dx w.r.t. : hsngval ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hsngval | .0028993 .0004123 7.03 0.000 .0020912 .0037074 ------------------------------------------------------------------------------ . regress rent c.pcturban c.hsngval Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(2, 47) = 47.54 Model | 40983.5269 2 20491.7635 Prob > F = 0.0000 Residual | 20259.5931 47 431.055172 R-squared = 0.6692 -------------+---------------------------------- Adj R-squared = 0.6551 Total | 61243.12 49 1249.85959 Root MSE = 20.762 ------------------------------------------------------------------------------ rent | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pcturban | .5248216 .2490782 2.11 0.040 .0237408 1.025902 hsngval | .0015205 .0002276 6.68 0.000 .0010627 .0019784 _cons | 125.9033 14.18537 8.88 0.000 97.36603 154.4406 ------------------------------------------------------------------------------ . regress rent c.pcturban##c.hsngval Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(3, 46) = 53.26 Model | 47553.1926 3 15851.0642 Prob > F = 0.0000 Residual | 13689.9274 46 297.607117 R-squared = 0.7765 -------------+---------------------------------- Adj R-squared = 0.7619 Total | 61243.12 49 1249.85959 Root MSE = 17.251 -------------------------------------------------------------------------------------- rent | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------------+---------------------------------------------------------------- pcturban | 3.359486 .6378362 5.27 0.000 2.075588 4.643383 hsngval | .0068502 .00115 5.96 0.000 .0045353 .009165 | c.pcturban#c.hsngval | -.0000666 .0000142 -4.70 0.000 -.0000951 -.000038 | _cons | -97.85703 49.0617 -1.99 0.052 -196.6131 .8990436 -------------------------------------------------------------------------------------- . margins, dydx(hsngval) Average marginal effects Number of obs = 50 Model VCE : OLS Expression : Linear prediction, predict() dy/dx w.r.t. : hsngval ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hsngval | .0023936 .0002651 9.03 0.000 .0018599 .0029272 ------------------------------------------------------------------------------
Note how in the IV spec with interaction, the coefficient on housing value is over 5.5 times larger than in the simple IV spec. The marginal effect (averaging over percent urban), however, is pretty similar.
Finally, if you only have one instrument you probably want something like this:
ivregress 2sls rent c.pcturban (c.hsngval c.hsngval#c.pcturban = c.faminc c.faminc#c.pcturban) margins, dydx(hsngval)
A quadratic endogenous variable would be:
ivregress 2sls rent c.pcturban (c.hsngval##c.hsngval = c.faminc##c.faminc) margins, dydx(hsngval)
The example above did not work out as nicely with these, so I used two instruments.
Similar Posts:
- Solved – Testing for significant difference between coefficients from 2 different IV regressions (Stata or in general)
- Solved – How to compare sub-sample mean with the sample mean
- Solved – IV regression, endogeniety and Wu-hausman question
- Solved – Matrix clustering based on a Jaccard distance cutoff
- Solved – How to test a yes/no outcome with different inputs