I've been performing a manual 2sls regression and I've come with the following results and that I find a bit suspicious.
I've done the F-test of the first-stage regression and I've obtained a score of 31.25 (F(IVs,n-k)=31.25) and I've performed the rank test of the first-stage regression with the following code in Stata:
ranktest (endog_var)(Z1 Z2 exogen_var), full robust
In which "endog_var" is the endogenous variable, "Z1" and "Z2" are the instruments and "exogen_var" is a set of exogenous variables. And I've obtained a p-value of 0.17, i.e. the hypothesis that the matrix is not full rank is not rejected. Is that possible? Am I doing something wrong? Should I partial out the exogenous variables?
Yes, you need to partial out the exogenous variables using the
partial option in
ranktest. So the correct syntax should be:
ranktest (endog_var)(Z1 Z2), partial(exogen_var) full robust
This is also done in the documentation for
ranktest (at the bottom of the helpfile). You can check this by comparing your results to the Kleibergen-Paap rk reported by
ivreg2 with robust standard errors.
As an example:
// use a toy data set sysuse auto // run the iv regression with two instruments using ivreg2 ivreg2 price weight (mpg = foreign trunk ), first robust /* this is the output from the first stage diagnostics for underidentification Underidentification test Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified) Ha: matrix has rank=K1 (identified) Kleibergen-Paap rk LM statistic Chi-sq(2)=1.90 P-val=0.3863 */ // Test 1 // compare the Kleibergen-Paap rk LM test from ivreg2 with the manual test (partial out exogenous variables) ranktest (mpg) (foreign trunk), partial(weight) full robust */ output Kleibergen-Paap rk LM test of rank of matrix Test statistic robust to heteroskedasticity Test of rank= 0 rk= 1.90 Chi-sq( 2) pvalue=0.386287 */ // Test 2 // compare the Kleibergen-Paap rk LM test from ivreg2 with the manual test (not partialling out exogenous variables) ranktest (mpg) (foreign trunk weight), full robust */ output Kleibergen-Paap rk LM test of rank of matrix Test statistic robust to heteroskedasticity Test of rank= 0 rk= 30.70 Chi-sq( 3) pvalue=0.000001 */
You see that test 1 produced the correct rk statistic and p-value (as in the
ivreg2 output) whilst test 2 did not come up with the correct results.
And yes, the matrix may not be of full rank if your instruments are not strong enough.
ivreg2 also provides the F-test for the excluded instruments (see the Angrist and Pischke F-statistic). For more information see section 7 of Baum et al (2007) "Enhanced routines for instrumental variables / generalized method of moments estimation and testing" (link). Using
ivreg2 is generally a better strategy than doing 2sls "by hand" because the Stata routine provides you with a whole range of test statistics that are useful and it also provides you with the correct standard errors.